Re: glib hash memory footprint



Davi de Castro Reis <davicastro terra com br> writes:

> 1. Is there some reason to use linked lists instead of linear probing
> (putting collided elements in the next bucket)?
> 
> 2. Would glib/gtk team be interested in a linear/custom probing hash
> implementation?

This sounds like an excellent idea to me. Not only would it use much
less memory, the cache behavior would also be improved.

I have previously thought a bit about how to reduce memory use myself,
but the solution I came up with was much more convoluted and ugly than
yours.

One thing is not clear to me though: How does the linear probing know
when to stop? It is legal for a GHashTable to contain (NULL, NULL)
elements, so you can't use those to mean 'stop search'. Of course it
could go on until it hit the end of the table, but that would be too
expensive in the case where the lookup failed.

This idea I had may be useful: Reserve two special key/value pairs,
UNINITIALIZED_KEY/VALUE and UNUSED_KEY/VALUE. When the hashtable is
first initialized, it is filled with UNINITIALIZED_KEY/VALUE and the
linear probing could just stop when it saw an uninitialized entry. 

When an element is removed from the hash table its entry is set to
UNUSED, not UNINITIALIZED, because setting it to UNINITIALIZED would
mean that the linear probing could stop prematurely.

When a collission happens, the first UNUSED or UNINITIALIZED spot is
used for the new item.

What happens if the user actually tries to insert these key/value
pairs? The table could just reserve a boolean for each:

        gboolean contains_uninitialized_pair;
        gboolean contains_unused_pair;

that would be checked during insert and lookup.


The same scheme could be used to avoid calling the equal function in
the common case: Reserve a third pair, COLLISSION_KEY/VALUE, that
would indicate that the entry is actually a collission site. That way
if an entry *wasn't* a collision site, the lookup function could just
return without calling the equal function at all.

The UNINITIALIZED pair could be (NULL, NULL) so that g_new0() could be
used for allocating the table.


Søren



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]