*From*: Keith Packard <keithp keithp com>*To*: "Alan M. Evans" <ame1 extratech com>*Cc*: gtk-devel-list gnome org, Sander Vesik <sander_traveling yahoo co uk>, Keith Packard <keithp keithp com>*Subject*: Re: glib hash memory footprint*Date*: Tue, 31 Aug 2004 15:27:08 -0700

Around 15 o'clock on Aug 31, "Alan M. Evans" wrote: > For a bonus, if the array is kept sorted, then chained lookups can be > binary searches. For this complexity, you should throw away the hash table and just use a O(log n) data structure from the start (skip list, binary tree, etc). > I don't know about it being non-trivial. The array solution seems pretty > trivial to me. You may be right about fragmentation, but I'm not certain > that it's an insurmountable problem. An array solution eliminates the performance advantages of hash tables on insert or delete; you have to shuffle elements around, making them O(n) worst case, and > O(log n) best case. > I suppose that this is just my personal bias, but I don't think that > closed hashing is a good general purpose solution, given the requirement > to allocate a table larger than your data. Depends on what you want to do; open addressing with non-linear re-probing yields good average case performance with minimal overhead -- consider a binary tree has significant overhead in pointers and colors, and arrays do not offer O(log n) insert/delete. What it doesn't offer is O(log n) worst case performance, but using better reprobing reduces the odds of worst-case performance dramatically. I use open addressing with resizable hash tables in approximately power of two sizes using prime pairs (where P and P-2 are both prime) for hash and rehash respectively. Using prime pairs ensures that any input value will eventually hit every hash table bucket (for our worst-case O(n) performance). Resizing at power-of-two intervals yields an amortized constant overhead, so the overall performance of the hash table is O(log n) in the average case. Worst case (O(n)) performance occurs with a vanishingly small amount of the time (I think it's something like 1/p**n), assuming the table is kept fairly empty (like 50% utilization). At 50% occupancy, the overhead for this hash table is essentially identical to a binary tree. (couldn't help myself, I always like a good data structures and algorithms discussion) -keith

**Attachment:
pgp5iqqCzxUq9.pgp**

**References**:**Re: glib hash memory footprint***From:*Alan M. Evans

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]