Glib hash tables, how to create the hash for optiomal performance.



Hi list

I use quite a lot of glib hashtables.
One thing I have found missing from the documentation is how the hash
function I provide should generate
the hash values for optimal performance. I would really like to know how i
should tune the calculation of hashes so that the hash table
in glib performs in an optimal way  speed wise.


If I want to create a hash function to make the hash table as efficient and
fast as possible,  how should
I calculate the hash?


Example:
If I have 5 items  and four hash functions that hashes these 5 objects as
function 1:   1 2 3 4 5
function 2:    1 10 100 1000 10000
function 3:  1 2 4 8 16
function 4:   1000 1001 1002 1003 1004

Which hash function will provide the optimal performance?


I guess my question is,   when i create the hash function,   how should i
distribute the calcualted hash values in the range 0 to 2^32-1
in order to get optimal performance.



Second question is:    how should I distribute collissions when calculating
the hash value?
Should I try to avoid collissions completely?
Should I try to completely avoid collissions? or should I try to
deliberately create some collissions in order to keep the number of hash
buckets down?


I.e.    given n objects    n commonly>100.000
for optional performance (speed)   how many distinctly different hash values
should I compute and how many objects at most should I allow to collide.
nb=f(n):  number of hash buckets   how does it relate to n.   what is f(n)
nc=g(n):  number of objects that compute to the same hash value.    should
g(n) == 1 ? or is it cheapre to have a small number of objects in
    each hash bucket than having many more hash buckets?

How should the hash values in nb be distributed in the interval 0-2^32-1 ?








[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]