Re: Proposal for a new ghash table

> I have an implementation of this in Kazlib, minus
> the refinements described below.
> The license is BSD-like,
> that it can readily be used in GPLed code.
> > Before returning, a good thing should be to move
> > the node to position V1 if V != V1. The next time
> > you'll want to access to the node, the look up
> > will be faster.
> This is an interesting refinement that I do not have in the aforementioned
> library (but which could be easily added); the grow is done synchronously
> rather than distributed in this fine-grained manner.
> This is obviously better when it's undesirable to have wide variations in
> response time; that is, no particular insertion should eat the sudden cost
of a
> resizing.
> The only problem I see, the way I understand it, is that unsuccessful
> are now O(log n), so this neat algorithm is paid for by an order increase
> one of the hash operations.
    Not really. If the hash fuction is good enough to have a O(1) search in
    a classic hash table, then you'll end up with a O(32) [:)] search in
    implementation since you do not have more that 32 tables to search
    (well, actually you may have more than 32 tables if you have more
    than 2^32 bytes to store the tables :)

    By using a base size of 2^5 and reducing the max size to 2^20
    (these are good limits since you do not have to alloc too many tables
    for a small set of nodes, but you still have the possibility to store a
    large number of nodes in the two or three last tables), you limit
    to 16 the number of search in the worst case.

    And assuming you really don't want to lose time on hash search, you
    can implement

> Thus I'd be feeling ambivalent about adding this to general-purpose,
> hash implementation, for which I can't anticipate the frequency of
> lookups.

    Well. We can compare this to the current implementation :

1/ freeze/thaw around one insertion is equivalent to no freeze/thaw
2/ large number of items : more than 1.000.000
3/ small number of items : less than 500.

* adding a lot of elements in the hash table without using freeze/thaw
is not really practicable since the ghash->frozen is set to FALSE during
the ghash creation (the resize func is called for each inserted node if
frozen is FALSE). Even if we lose some time on the first hash query, we
are the winner since we do not spend so much time during insertion.
Remember the second query is as fast as the first one in the current ghash.

* adding a small number of elements in the hash table without free/thaw
(ie small number of resize). We achive the same with the proposed

* freeze, adding a small number of items, thaw -> really the same as the
last point :)

* freeze, adding a large number of items, thaw -> Bang, the big resize
from an overfull hash table to a large, well-formed one. Of course, we
do not freeze the machine with the proposed implementation, except
if we do the same (calling the thaw/rearrage/
whatever-you-want-to-call-it function). In this case, we have the same
performance than the current implementation. If we do not, we simply
have a penality on the first request - not the second.

* freeze, adding a large number of item, and do not thaw -> leads to
a small hash tables with big node lists (the search becomes O(n), not
O(1)). So we win.

When we are not better than the current implementation, we have three
* we do the same or
* we can do the same or
* we are going to do the same :)



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]