Re: hash table freeze is useless



> > I believe the intent is so you can freeze a hashtable that is 
> > large, delete half the items, insert a bunch of new items
> > without resizing the table extraneously.
[...]
> Ah yes, it works for that. I do think the function is intended (or at
> least documented) to work in the case where the table was not already
> large, however. And even in the above case it breaks if you insert more
> items than you delete.

	Why does g_hash_table_insert perform a lookup (by calling
g_hash_table_lookup_node)?

	It does so because if the node already exists, then we simply want
to overwrite the value (of the key/value pair) that is already there.  As
the source says:

  if (*node)
    {
      /* do not reset node->key in this place, keeping
       * the old key might be intended.
       * a g_hash_table_remove/g_hash_table_insert pair
       * can be used otherwise.
       *
       * node->key = key; */
      (*node)->value = value;
    }

	...but if the node does *not* exist, then we must create a new
node with g_hash_node_new.  And after adding a new node, we should see if
the hash table needs resizing:

  else
    {
      *node = g_hash_node_new (key, value);
      hash_table->nnodes++;
      if (!hash_table->frozen)
        g_hash_table_resize (hash_table);
    }  

	Notice that the g_hash_table_resize only gets called if the table
is not frozen.  g_hash_table_resize does a few calculations to determine
whether or not a resizing of the table is actually necessary.  Here
are those calculations:

  if ((nodes_per_list > 0.3 || hash_table->size <= HASH_TABLE_MIN_SIZE) &&
      (nodes_per_list < 3.0 || hash_table->size >= HASH_TABLE_MAX_SIZE))
    return;

	When the table is frozen, you save those calculations (and any
actual resizing that would occur) every time you insert an item.

	When the table is thawed, the table is resized one time only.  A
resize involves going through this loop:

  for (i = 0; i < hash_table->size; i++)
    for (node = hash_table->nodes[i]; node; node = next)
      {
        next = node->next;
  
        hash_val = (* hash_table->hash_func) (node->key) % new_size;
                   
        node->next = new_nodes[hash_val];
        new_nodes[hash_val] = node;
      }

	...which can take a while.  Calling this loop once, instead of
several times (as you add a large number of items) speeds things up.

	Havoc's complaint, if I read it correctly, is with the
g_hash_table_lookup_node that gets called every time you insert an item.
To remove that call from g_hash_table_insert, we would have to have some
other way to see whether or not the node holding the given "key" (of the
key/value pair) already exists.  Otherwise, how will we know whether or
not to allocate the memory for a new node (instead of simply overwriting
the value in the given key/value pair)?

	In short, I think this is a problem caused by having keys which
are not the same thing as your calculated hash value.  Glib handles
collisions with linked lists, so when your table is frozen and you add a
large number of items, your insertions will approach the speed of a
linked-list lookup.  It's anyone's guess as to whether that will slow
things down more than performing the resize test calculations plus
periodic resizing of the table.  It might be fun to write a test program
and run some benchmarks....


--Derek Simkowiak
  dereks@kd-dev.com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]