Re: g_str_hash

• From: Karl Nelson <kenelson ece ucdavis edu>
• To: gtk-devel-list redhat com
• cc: kenelson sequoia ece ucdavis edu
• Subject: Re: g_str_hash
• Date: Fri, 11 Feb 2000 16:46:24 -0800

```>ok, that's good enough a proof, and thanks for actually bothering
>to cook up a test case (something everyone else has fallen short on ;)

Heh, I guess I was the only one bored enough to come up with more
than speculation.

Actually, why stop at 31 after all since I have a test case lets
justify 31 a bit... here is the stats for all primes which
are easy to compute.  (karl sends 10 computers to work.)

impl          short    long
(24475) (4330561)
2:   (h << 1)     + *p : 10273    364817
3:   (h << 1) + h + *p :  4048      NC
5:   (h << 2) + h + *p :  1330      NC
7:   (h << 3) - h + *p :   594    131504
9*:  (h << 3) + h + *p :   250    132501
17:  (h << 4) + h + *p :    44    131483
31:  (h << 5) - h + *p :     1    131478
127: (h << 7) - h + *p :     0    131490
257: (h << 8) + h + *p :     4    132139

Thanks to the entire ECE department of UC Davis for contributing
the computers to calculate this. ;-)

Anything less than 31 has problems with short, so we can drop them.
So we would be justified in using 31 or 127.  257 has weird statistics
because it matches close to a byte boundary.  Thus I think at 31 we are
in safe statistical land.

>the only think left to note is that you can spare the first
>hash step, since 0 needs no propagation. so we get:
>
>guint
>g_str_hash (gconstpointer key)
>{
>  const gchar *p = key;
>  guint h = *p;
>
>  for (p = key + 1; *p != '\0'; p += 1)
>    h = (h << 5) - h + *p;
>
>  return h;
>}

One objection, it segfaults on 0 :-)

--Karl

```