Re: [xml] Better hash function for dict.c

On Wed, Aug 06, 2008 at 09:56:22PM +0200, Stefan Behnel wrote:

Stefan Behnel wrote:
Daniel Veillard wrote:
if you have a bit of time then, maybe you can rerun your initial tests
with that one, is that possible ?

I can try, sure. Just send me a patch that removes the current hash
function from SVN and adds the new one, and I will find a way to compare
the two.

  Sorry, didn't replied with earlier posts, i was stuck with something else...

Here's a little test script that runs "xmllint --noout" on a generated XML
file with varying numbers of distinct tag names, together with the numbers I
get. It looks like the new hash is a little slower than the one from my
original patch. At least, I get slightly lower throughput, but it's less than
10% difference throughout, so I guess it's within the usual margin. This is
likely due to the 4-byte reads of the other hash.

  yes that's a bit slower. 

The distribution seems to be about comparable, and the timings stay more or
less constant over the range I tested (up to 1000 entries). Even with 2000
entries in the dict, the timings are only 15% lower than with 8, so I would
say this hash works just as well as the other one.

I did a quick check with lxml's benchmarks and they give me comparable
results: slightly slower, but about the same behavioural improvement.

Given that the new hash gives correct results, which the other one didn't, I'm
fine with the change. The price is definitely low enough.

  Okay, at least we have a version usable fixing the problem on the previous
release. I'm still building a regression test in C trying to assert all the
properties of dictionaries values, including behaviour on sub dictionaries
which are not tested so far as part of libxml2 but only within libxslt use,
which is why some of the problems were not detected until very late.


Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]