Re: glib hash memory footprint

From: Keith Packard <keithp keithp com>
To: "Alan M. Evans" <ame1 extratech com>
Cc: gtk-devel-list gnome org, Sander Vesik <sander_traveling yahoo co uk>, Keith Packard <keithp keithp com>
Subject: Re: glib hash memory footprint
Date: Tue, 31 Aug 2004 15:27:08 -0700

Around 15 o'clock on Aug 31, "Alan M. Evans" wrote:

> For a bonus, if the array is kept sorted, then chained lookups can be
> binary searches.

For this complexity, you should throw away the hash table and just use a 
O(log n) data structure from the start (skip list, binary tree, etc).

> I don't know about it being non-trivial. The array solution seems pretty
> trivial to me. You may be right about fragmentation, but I'm not certain
> that it's an insurmountable problem.

An array solution eliminates the performance advantages of hash tables on 
insert or delete; you have to shuffle elements around, making them O(n) 
worst case, and > O(log n) best case.

> I suppose that this is just my personal bias, but I don't think that
> closed hashing is a good general purpose solution, given the requirement
> to allocate a table larger than your data.

Depends on what you want to do; open addressing with non-linear re-probing 
yields good average case performance with minimal overhead -- consider a 
binary tree has significant overhead in pointers and colors, and arrays do 
not offer O(log n) insert/delete.  What it doesn't offer is O(log n) worst 
case performance, but using better reprobing reduces the odds of 
worst-case performance dramatically.

I use open addressing with resizable hash tables in approximately power of
two sizes using prime pairs (where P and P-2 are both prime) for hash and
rehash respectively.  Using prime pairs ensures that any input value will
eventually hit every hash table bucket (for our worst-case O(n) 
performance).

Resizing at power-of-two intervals yields an amortized constant overhead, 
so the overall performance of the hash table is O(log n) in the average 
case.  Worst case (O(n)) performance occurs with a vanishingly small 
amount of the time (I think it's something like 1/p**n), assuming the 
table is kept fairly empty (like 50% utilization).  At 50% occupancy, the 
overhead for this hash table is essentially identical to a binary tree.

(couldn't help myself, I always like a good data structures and algorithms 
discussion)

-keith

Attachment: pgp5iqqCzxUq9.pgp
Description: PGP signature

References:
- Re: glib hash memory footprint
  - From: Alan M. Evans

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]