Much faster? i18n? tls?

Discussion:

Behdad Esfahbod

2003-11-10 05:36:19 UTC

Hi,

Was tweaking with the grep patch, and also tracking another
thread in another list, which was showing how on Red Hat 9 a
simple text intensive program (called hspell) is much slower than
Red Hat 8, and investigations have shown so far that it's all
caused by /lib/tls. Switching to /lib/i686 makes things go much
faster. Any idea? And it's not a multi-threaded application.

So I focus on sed: Pretty slow on non-C locales:-(

[***@mces behdad]$ echo $LANG
en_US.UTF-8
[***@mces behdad]$ ll /bin/ls
-rwxr-xr-x 1 root root 73460 Oct 12 04:50 /bin/ls
[***@mces behdad]$ time sed -e 's/./x/g' /bin/ls > /dev/null

real 0m4.248s
user 0m3.800s
sys 0m0.000s
[***@mces behdad]$ time LANG=C sed -e 's/./x/g' /bin/ls > /dev/null

real 0m0.180s
user 0m0.050s
sys 0m0.000s
[***@mces behdad]$

And /bin/ls is only 72kb!!!

But you should have noticed that /bin/ls is not a valid UTF-8
piece. Actually /bin/ls is very small, if you run it on a bigger
piece of garbage (speaking encoding ofcourse), it's hard not to
get a SegFault. I have reported that here:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=109606

Any idea? May some one look if the same caching can be done
here?

behdad

Mike A. Harris

2003-11-10 08:18:31 UTC

Permalink

Post by Behdad Esfahbod
Was tweaking with the grep patch, and also tracking another
thread in another list, which was showing how on Red Hat 9 a
simple text intensive program (called hspell) is much slower than
Red Hat 8, and investigations have shown so far that it's all
caused by /lib/tls. Switching to /lib/i686 makes things go much
faster. Any idea? And it's not a multi-threaded application.
So I focus on sed: Pretty slow on non-C locales:-(
en_US.UTF-8
-rwxr-xr-x 1 root root 73460 Oct 12 04:50 /bin/ls
real 0m4.248s
user 0m3.800s
sys 0m0.000s
real 0m0.180s
user 0m0.050s
sys 0m0.000s
And /bin/ls is only 72kb!!!

You should average the times of multiple runs, at least 5-10 for
each test case to help remove noise from the numbers, and
remove/reduce cache colouring and other factors from getting in
the way of your numbers.

--
Mike A. Harris ftp://people.redhat.com/mharris
OS Systems Engineer - XFree86 maintainer - Red Hat

Behdad Esfahbod

2003-11-10 09:45:31 UTC

Permalink

Post by Mike A. Harris

Well, I have already done that. And with a few hundred megs of
free memory and a 2.4 P4M idle CPU, and such a huge difference, I
believe s/n is high enough.

Ulrich Drepper

2003-11-10 21:21:48 UTC

Permalink

I doubt that going with the /lib/i686 version makes it faster. In fact,
the TLS code should be between 5-10% faster.

Post by Behdad Esfahbod
real 0m4.248s
user 0m3.800s
sys 0m0.000s
real 0m0.180s
user 0m0.050s
sys 0m0.000s

That's expected. UTF-8 handling is complicated. And we do have special
support for single-byte encodings. You should be happy about that.

Having this said, we might have some speedups for the regex code at some
point. Speedups specifically for UTF-8. If you want to see this
sooner, get out your editor and start hacking regex.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

Behdad Esfahbod

2003-11-11 03:18:18 UTC

Permalink

Post by Ulrich Drepper

I doubt that going with the /lib/i686 version makes it faster. In fact,
the TLS code should be between 5-10% faster.

I've already seen the thread on LKML with Linus which I assume
solves this problem.

Post by Ulrich Drepper

Post by Behdad Esfahbod
real 0m4.248s
user 0m3.800s
sys 0m0.000s
real 0m0.180s
user 0m0.050s
sys 0m0.000s

That's expected. UTF-8 handling is complicated. And we do have special
support for single-byte encodings. You should be happy about that.

Sure, but UTF-8 is not such a hard thing to handle.

Post by Ulrich Drepper
Having this said, we might have some speedups for the regex code at some
point. Speedups specifically for UTF-8. If you want to see this
sooner, get out your editor and start hacking regex.

I would definitely do. I'm afraid the problem is not UTF-8
itself, but other legacy multi-byte ones. I mean, may it be that
special support for UTF-8 may be needed...

behdad

Ulrich Drepper

2003-11-11 03:27:40 UTC

Permalink

Post by Behdad Esfahbod
I've already seen the thread on LKML with Linus which I assume
solves this problem.

If any code needed that specific patch, make sure the author never
touches a keyboard again and rewrite the code. No production code
should ever be affected by that change in any noticeable way.

Post by Behdad Esfahbod
Sure, but UTF-8 is not such a hard thing to handle.

Then do it.

Post by Behdad Esfahbod
I would definitely do. I'm afraid the problem is not UTF-8
itself, but other legacy multi-byte ones. I mean, may it be that
special support for UTF-8 may be needed...

No legacy encoding is of any interest. If we add any special encoding
optimization this will be only and exclusively for UTF-8. It's the only
encoding one needs.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖