# Re: Proposal for a new ghash table

• From: Kaz Kylheku <kaz ashi footprints net>
• To: gtk-devel-list redhat com
• Subject: Re: Proposal for a new ghash table
• Date: Fri, 26 Nov 1999 09:03:01 -0800 (PST)

```On Fri, 26 Nov 1999, Emmanuel DELOGET wrote:

> Date: Fri, 26 Nov 1999 12:54:12 +0100
> From: Emmanuel DELOGET <logout@free.fr>
> To: gtk-devel-list@redhat.com
> Subject: Proposal for a new ghash table
> Resent-Date: 26 Nov 1999 11:49:44 -0000
> Resent-From: gtk-devel-list@redhat.com
> Resent-cc: recipient list not shown: ;
>
> Hi everibody.
>
> I've just looked at the code of the current ghash
> (i was wondering what was the underlying code
> for the hash table resizing). Here is another
> approach for a dynamic hash table, based upon
> an article or Per-Ake (Paul) Larson (can't find the
> next days, and will post results here.
>
> The Larson's algorithm does not introduce any
> freeze/thaw statements. It increases the size
> of the hash table in order to limit the size of the
> hash node list. Basically, it starts with a 2^n
> size. When the number of hash node is bigger
> than its current size, the size of the hash table
> grows to 2^(n+1).

I have an implementation of this in Kazlib, minus
the refinements described below.
http://users.footprints.net/~kaz/kazlib.html The license is BSD-like, meaning
that it can readily be used in GPLed code.

> a particular node, you'll have to
>     1) compute the current hash value V from the
>         key K (store it as V1)
>     2) if the node is found, return it.
>     2) if not, find the greater n value defined by
>        2^n < V
>     4) compute the new K value
>     5) go to step 2

> Before returning, a good thing should be to move
> the node to position V1 if V != V1. The next time
> will be faster.

This is an interesting refinement that I do not have in the aforementioned
library (but which could be easily added); the grow is done synchronously
rather than distributed in this fine-grained manner.

This is obviously better when it's undesirable to have wide variations in
response time; that is, no particular insertion should eat the sudden cost of a
resizing.

The only problem I see, the way I understand it, is that unsuccessful lookups
are now O(log n), so this neat algorithm is paid for by an order increase in
one of the hash operations.