Re: [xml] Better hash function for dict.c



On Sun, Apr 20, 2008 at 10:51:38AM +0200, Stefan Behnel wrote:
Hi again (and sorry for all the noise),

Stefan Behnel wrote:
If an application benefits from a different hash function depends on the
vocabulary it uses in its XML files. A slow but well distributing hash
function performs much better for large vocabularies (or many different
vocabularies), while small vocabularies will not fill the dict enough to make
a difference, in which case the faster hash function wins.

So the obvious solution is to combine the two. Here is a patch that uses the

  Right, that's the right approach, optimize differently on the two end
of the spectrum.

original hash function to start with (but lowers the bucket fill limit a
little from 4 down to 3) and when it reaches the grow barrier for the first
time, switches to the new hash function. You will find a performance
comparison below, based on xmllint.

I decreased the bucket fill barrier for two reasons: to trigger an early
switch between the two functions, and because the second function has much
better load balancing, so a high bucket size in one place really means that
most buckets are at least close to that fill rate. As you can see from the

  That's reasonnable, yes, no problem

numbers, it works pretty well over a wide range of vocabulary sizes from small
to medium, and as I've shown before, it performs much (much!) better for
larger sizes.

BTW, Bob Jenkins did a comparison of a couple of hash functions, including the
additive hash (a variant of which is currently used) and the hash function
used in the patch.

http://burtleburtle.net/bob/hash/doobs.html

The hash function itself was written by Paul Hsieh and published on his web
site. According to Bob Jenkins, it's public domain (although I didn't ask
directly).

http://www.azillionmonkeys.com/qed/hash.html

Any objections to getting this patch merged?

  None, looks very good, thanks a lot for assembling all of this and providing
full and convincing numbers :-)

  I will apply this to SVN head now,

  thanks again !

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]