Re: [xml] RAW && NXT with strncmp()
- From: Daniel Veillard <veillard redhat com>
- To: Chris Anderson <christop charm net>
- Cc: xml gnome org
- Subject: Re: [xml] RAW && NXT with strncmp()
- Date: Wed, 1 Oct 2003 14:54:39 -0400
On Wed, Oct 01, 2003 at 02:26:30PM -0400, Chris Anderson wrote:
On Wed, 2003-10-01 at 12:47, Bjorn Reese wrote:
On Wed, 2003-10-01 at 15:50, Daniel Veillard wrote:
That I was looking at it this yesterday actually but I don't feel
that confident about such a change, yet ...
Not to belabor this point too much, it's not that important...
There are a couple of things going on in that loop:
First, icc completely optimizes the loop away, gcc just optimizes away
the z = 1 statement since it's not used.
never tried icc, I believe it would generate faster code but
my target compiler is obviously gcc :-)
Second, gcc puts the all of the characters of "xmlns" into registers
before the loop begins, so the no strncmp test is really testing the
speed of 5 * 1000000000 register compares and branches, which is a bit
faster than loading the constants into registers and comparing.
Third, the other point was that there are cases where most of the
strings are not equal, this loop just tests the case where they are
equal.
Well I just want the equality. One of the reasons I was using CUR/NXT
is that I used to do some "buffer grows if needed" code handling in it.
Not the case anymore which allows the optimisation.
Fourth, strncmp is not an intrinsic in either gcc or icc. Use memcmp,
that will generate x86 repz string instructions as long as the size of
the memory being compared is constant (size == 5).
Yup I was precisely looking at objdump -d a.out too :-)
The no memcmp is quite a bit faster, but the instruction count isn't
that different. I think it's more of a clarity thing unless you are in
a code hot spot.
Yeah it's not in hot spot (element, attributes and character parsing).
Now if you really want to chase the microsecond, if you manage to
optimize xmlParseCharData in parser.c, then it's an immediate gain !
W.r.t. instruction count cachegrind was a bit surprizing initially
finding 2000045214 cycle cost for the original and 400045209 for
the memcmp, independantly of the string being compared. Makes some
sense once the generated code has been looked at :-)
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]