Re: g_str_hash



On Fri, 11 Feb 2000, Karl Nelson wrote:

> 
> > > This is the exact same hash except operations have been moved to
> > > enhance speed.  The difference being the new one keeps the upper bits 
> > > and thus has less collisions and operates somewhere between the TCL and 
> > > GLib on.  
> > 
> > I suspect the appeal of the ASU one is solely that it is nice and
> > easily analyzable. I won't believe that it creates less collisions
> > until somebody shows me a set of (plausible) strings where that is the case.
> 
> Well, I have some formulations which should have much better collision
> properties.  (This took several hours, so please give it more than
> a token glance.)
> 
> This hash is based on multiplying by 31.  (and is approx the same speed 
> as TCL.)
>  
> guint hash (gconstpointer v)
> {
>   const char *p;
>   guint h=0;
>   for(p = (const char*) v; *p != '\0'; p += 1) {
>     h = ( h << 5 ) - h  + *p;
>   }
>   return h;
> }

> Short word test: (3-16 char)
> ----------------------------
> (with 24474 words, most short)
> With Glib current:   86 hits  
> With TCL:           250 hits 
> With X31:             1 hits  

> Long test:  (20-40 char, vary in tail)
> ----------------------------------------
> Using the 2040 longest words in dict and then pairing them
> with every other word we get 4330561 phrases.  (we have a lot
> more phases so the hit ratio is expected to be a lot higher.)
> 
> Glib:  220793
> TCL:   132501
> X31:   131478

ok, that's good enough a proof, and thanks for actually bothering
to cook up a test case (something everyone else has fallen short on ;)

the only think left to note is that you can spare the first
hash step, since 0 needs no propagation. so we get:

guint
g_str_hash (gconstpointer key)
{
  const gchar *p = key;
  guint h = *p;

  for (p = key + 1; *p != '\0'; p += 1)
    h = (h << 5) - h + *p;

  return h;
}

any further objections?

---
ciaoTJ




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]