*From*: Kaz Kylheku <kaz ashi footprints net>*To*: gtk-devel-list redhat com*Subject*: Re: Proposal for a new ghash table*Date*: Mon, 29 Nov 1999 13:25:52 -0800 (PST)

On Mon, 29 Nov 1999, Emmanuel DELOGET wrote: > > The only problem I see, the way I understand it, is that unsuccessful > lookups > > are now O(log n), so this neat algorithm is paid for by an order increase > in > > one of the hash operations. > Not really. If the hash fuction is good enough to have a O(1) search in > a classic hash table, then you'll end up with a O(32) [:)] search in > this > implementation since you do not have more that 32 tables to search > (well, actually you may have more than 32 tables if you have more > than 2^32 bytes to store the tables :) Heh heh. Thus, by this logic, binary trees have constant time lookup too. Assuming my nodes are 12 bytes <left, right, data> I can fill up most of the 4GB address space with a full, balanced tree having 2^28-1 nodes. Therefore any search is O(28), therefore constant time! ;) > By using a base size of 2^5 and reducing the max size to 2^20 > (these are good limits since you do not have to alloc too many tables You just about might as well use a balanced binary search tree then. > for a small set of nodes, but you still have the possibility to store a > large number of nodes in the two or three last tables), you limit > to 16 the number of search in the worst case. The worst case for a search remains bounded by O(n), because it's still a hash table. ;) > Well. We can compare this to the current implementation : My concern was whether the lazy movement of nodes after a growth is superior to doing immediate movement, not whether any of this is better than the current implementation (which is obviously less than state of the art in some areas). The conclusion is that the lazy transfer raises the *minimum* bound on the time for *unsuccessful* lookups from o(1) to o(log n), (with the upper bound remaining at O(n)) which could adversely impact the performance of some hash uses. Thus, for instance, the operation of inserting n nodes into a table, which duplicate checking for each insertion, now has a minimum performance bound of o(n log n), which is also the average case assuming a well-behaved hashing function. If you are going to pay o(n log n) time to construct a dictionary, you might has well have something to show for it, like having it in sorted order. With immediate movement (nodes are transferred to their proper chain on each growth) the cost of populating the hash table has an o(n) lower bound, rather than o(n log n), because searches for non-existent entries can be declared unsuccessful after traversing just one chain. That's without any freezing/thawing, which would actually be detrimental by slowing down the dupe checks. You might not thing that unsuccessful hash table lookups are important. But the fact is that g_hash_insert actually performs a lookup before inserting; the authors apparently thought that checking for duplicates is so important that it should be buried in the interface.

**References**:**Re: Proposal for a new ghash table***From:*Emmanuel DELOGET

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]