Re: [xml] Better hash function for dict.c


Stefan Behnel wrote:
is quite short and readable and seems to do what I was looking for.

Some more real-world numbers. I used lxml to parse - using xmlCtxtParseFile()
- all .xml and .xsd files that "locate" found on my hard disc, some 58000
files, 218MB in total (I should really run a cleanup session when I find the

This took 14.62 seconds with the unchanged libxml2 2.6.32. The patched version
did this in 14.19 seconds, so that's 3% faster. Not much, but it's
reproducible, so it's not just noise. And note that there is a lot of OS
interaction involved, even though it parses from the disk cache. I also did a
test parsing the same files from memory and that takes the unpatched version
9.87 seconds (22MB/s) and the patched one 9.19 seconds (23,8MB/s), so that's
7% better. (BTW, libxml2 is lightning fast already, thanks to everyone who
made this possible - Daniel above all, I assume).

I ran callgrind on this benchmark (only measuring xmlDictLookup, which took 18
minutes already). The xmlDictGrow() function is called twice in both cases, so
the dictionary has the same size, but the timing for calls to memcmp() goes
down by 60%.

The total time used for looping linearly over the target buffer is only 20% (a
fifth) of the original version, where it took 54% of the overall runtime.
That's down to 15% now, which indicates that the hash table distribution is
much better in the patched version.

In total, xmlDictLookup() runs an averaged 25% faster according to callgrind.

I attached the two result logs. I recommend KCachegrind for investigating them.

So, I would like to get this integrated. Should I try fixing up the patch or
is anyone else interested?


Attachment: callgrind.out.patched-dict.gz
Description: GNU Zip compressed data

Attachment: callgrind.out.unpatched-dict.gz
Description: GNU Zip compressed data

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]