Re: [xml] Better hash function for dict.c

2008/4/22, Daniel Veillard <veillard redhat com>:
On Sun, Apr 20, 2008 at 10:51:38AM +0200, Stefan Behnel wrote:
 > Hi again (and sorry for all the noise),
 > Stefan Behnel wrote:
 > > If an application benefits from a different hash function depends on the
 > > vocabulary it uses in its XML files. A slow but well distributing hash
 > > function performs much better for large vocabularies (or many different
 > > vocabularies), while small vocabularies will not fill the dict enough to make
 > > a difference, in which case the faster hash function wins.
 > So the obvious solution is to combine the two. Here is a patch that uses the

  Right, that's the right approach, optimize differently on the two end
 of the spectrum.

 > original hash function to start with (but lowers the bucket fill limit a
 > little from 4 down to 3) and when it reaches the grow barrier for the first
 > time, switches to the new hash function. You will find a performance
 > comparison below, based on xmllint.
 > I decreased the bucket fill barrier for two reasons: to trigger an early
 > switch between the two functions, and because the second function has much
 > better load balancing, so a high bucket size in one place really means that
 > most buckets are at least close to that fill rate. As you can see from the

  That's reasonnable, yes, no problem

 > numbers, it works pretty well over a wide range of vocabulary sizes from small
 > to medium, and as I've shown before, it performs much (much!) better for
 > larger sizes.
 > BTW, Bob Jenkins did a comparison of a couple of hash functions, including the
 > additive hash (a variant of which is currently used) and the hash function
 > used in the patch.
 > The hash function itself was written by Paul Hsieh and published on his web
 > site. According to Bob Jenkins, it's public domain (although I didn't ask
 > directly).
 > Any objections to getting this patch merged?

  None, looks very good, thanks a lot for assembling all of this and providing
 full and convincing numbers :-)

  I will apply this to SVN head now,

I just need to chime in; great work Stefan!


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]