Discussion:
Much faster? i18n? tls?
Behdad Esfahbod
2003-11-10 05:36:19 UTC
Permalink
Hi,

Was tweaking with the grep patch, and also tracking another
thread in another list, which was showing how on Red Hat 9 a
simple text intensive program (called hspell) is much slower than
Red Hat 8, and investigations have shown so far that it's all
caused by /lib/tls. Switching to /lib/i686 makes things go much
faster. Any idea? And it's not a multi-threaded application.


So I focus on sed: Pretty slow on non-C locales:-(

[***@mces behdad]$ echo $LANG
en_US.UTF-8
[***@mces behdad]$ ll /bin/ls
-rwxr-xr-x 1 root root 73460 Oct 12 04:50 /bin/ls
[***@mces behdad]$ time sed -e 's/./x/g' /bin/ls > /dev/null

real 0m4.248s
user 0m3.800s
sys 0m0.000s
[***@mces behdad]$ time LANG=C sed -e 's/./x/g' /bin/ls > /dev/null

real 0m0.180s
user 0m0.050s
sys 0m0.000s
[***@mces behdad]$

And /bin/ls is only 72kb!!!


But you should have noticed that /bin/ls is not a valid UTF-8
piece. Actually /bin/ls is very small, if you run it on a bigger
piece of garbage (speaking encoding ofcourse), it's hard not to
get a SegFault. I have reported that here:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=109606


Any idea? May some one look if the same caching can be done
here?


behdad
Mike A. Harris
2003-11-10 08:18:31 UTC
Permalink
Post by Behdad Esfahbod
Was tweaking with the grep patch, and also tracking another
thread in another list, which was showing how on Red Hat 9 a
simple text intensive program (called hspell) is much slower than
Red Hat 8, and investigations have shown so far that it's all
caused by /lib/tls. Switching to /lib/i686 makes things go much
faster. Any idea? And it's not a multi-threaded application.
So I focus on sed: Pretty slow on non-C locales:-(
en_US.UTF-8
-rwxr-xr-x 1 root root 73460 Oct 12 04:50 /bin/ls
real 0m4.248s
user 0m3.800s
sys 0m0.000s
real 0m0.180s
user 0m0.050s
sys 0m0.000s
And /bin/ls is only 72kb!!!
You should average the times of multiple runs, at least 5-10 for
each test case to help remove noise from the numbers, and
remove/reduce cache colouring and other factors from getting in
the way of your numbers.
--
Mike A. Harris ftp://people.redhat.com/mharris
OS Systems Engineer - XFree86 maintainer - Red Hat
Behdad Esfahbod
2003-11-10 09:45:31 UTC
Permalink
Post by Mike A. Harris
Post by Behdad Esfahbod
Was tweaking with the grep patch, and also tracking another
thread in another list, which was showing how on Red Hat 9 a
simple text intensive program (called hspell) is much slower than
Red Hat 8, and investigations have shown so far that it's all
caused by /lib/tls. Switching to /lib/i686 makes things go much
faster. Any idea? And it's not a multi-threaded application.
So I focus on sed: Pretty slow on non-C locales:-(
en_US.UTF-8
-rwxr-xr-x 1 root root 73460 Oct 12 04:50 /bin/ls
real 0m4.248s
user 0m3.800s
sys 0m0.000s
real 0m0.180s
user 0m0.050s
sys 0m0.000s
And /bin/ls is only 72kb!!!
You should average the times of multiple runs, at least 5-10 for
each test case to help remove noise from the numbers, and
remove/reduce cache colouring and other factors from getting in
the way of your numbers.
Well, I have already done that. And with a few hundred megs of
free memory and a 2.4 P4M idle CPU, and such a huge difference, I
believe s/n is high enough.
Ulrich Drepper
2003-11-10 21:21:48 UTC
Permalink
Post by Behdad Esfahbod
Was tweaking with the grep patch, and also tracking another
thread in another list, which was showing how on Red Hat 9 a
simple text intensive program (called hspell) is much slower than
Red Hat 8, and investigations have shown so far that it's all
caused by /lib/tls. Switching to /lib/i686 makes things go much
faster. Any idea? And it's not a multi-threaded application.
I doubt that going with the /lib/i686 version makes it faster. In fact,
the TLS code should be between 5-10% faster.
Post by Behdad Esfahbod
real 0m4.248s
user 0m3.800s
sys 0m0.000s
real 0m0.180s
user 0m0.050s
sys 0m0.000s
That's expected. UTF-8 handling is complicated. And we do have special
support for single-byte encodings. You should be happy about that.

Having this said, we might have some speedups for the regex code at some
point. Speedups specifically for UTF-8. If you want to see this
sooner, get out your editor and start hacking regex.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
Behdad Esfahbod
2003-11-11 03:18:18 UTC
Permalink
Post by Ulrich Drepper
Post by Behdad Esfahbod
Was tweaking with the grep patch, and also tracking another
thread in another list, which was showing how on Red Hat 9 a
simple text intensive program (called hspell) is much slower than
Red Hat 8, and investigations have shown so far that it's all
caused by /lib/tls. Switching to /lib/i686 makes things go much
faster. Any idea? And it's not a multi-threaded application.
I doubt that going with the /lib/i686 version makes it faster. In fact,
the TLS code should be between 5-10% faster.
I've already seen the thread on LKML with Linus which I assume
solves this problem.
Post by Ulrich Drepper
Post by Behdad Esfahbod
real 0m4.248s
user 0m3.800s
sys 0m0.000s
real 0m0.180s
user 0m0.050s
sys 0m0.000s
That's expected. UTF-8 handling is complicated. And we do have special
support for single-byte encodings. You should be happy about that.
Sure, but UTF-8 is not such a hard thing to handle.
Post by Ulrich Drepper
Having this said, we might have some speedups for the regex code at some
point. Speedups specifically for UTF-8. If you want to see this
sooner, get out your editor and start hacking regex.
I would definitely do. I'm afraid the problem is not UTF-8
itself, but other legacy multi-byte ones. I mean, may it be that
special support for UTF-8 may be needed...

behdad
Ulrich Drepper
2003-11-11 03:27:40 UTC
Permalink
Post by Behdad Esfahbod
I've already seen the thread on LKML with Linus which I assume
solves this problem.
If any code needed that specific patch, make sure the author never
touches a keyboard again and rewrite the code. No production code
should ever be affected by that change in any noticeable way.
Post by Behdad Esfahbod
Sure, but UTF-8 is not such a hard thing to handle.
Then do it.
Post by Behdad Esfahbod
I would definitely do. I'm afraid the problem is not UTF-8
itself, but other legacy multi-byte ones. I mean, may it be that
special support for UTF-8 may be needed...
No legacy encoding is of any interest. If we add any special encoding
optimization this will be only and exclusively for UTF-8. It's the only
encoding one needs.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
Loading...