*From*: "Manuel M. T. Chakravarty" <chak is tsukuba ac jp>*To*: gtk-devel-list redhat com, hp redhat com*Subject*: Re: hash table freeze is useless*Date*: Sat, 06 Nov 1999 16:12:27 +0900

Havoc Pennington <hp@redhat.com> wrote, > I was just noticing that g_hash_table_insert() performs a hash lookup. > This means that if you freeze the hash table, then insert a lot of items, > the performance of g_hash_table_insert() will be degenerating into O(n) > over time, so your insert-items loop will have O(n^2) performance. Without > the freeze, it is basically O(n) (with the hash resizes amortized, and > each insertion constant time). > > For small values of n, it's probably a toss-up; for large values of n, > freeze() almost certainly causes a performance hit. > > Unless I'm wrong of course, please comment. ;-) > > Recommended solution would be to make freeze/thaw no-ops and say in the > docs that they are deprecated. > > A better solution than freeze/thaw might be a way to set the expected > number of items, g_hash_table_resize_for(hash, n_items) which would resize > the table in advance to accomodate that number of additional items, saving > n_items re-insertions and perhaps reducing the total number of resizes. Did anybody ever notice a performance problem when inserting or removing a large number items without freezing the table? I believe that there won't be any performance problem and table freeze is indeed useless. The reason for this claim is that table resizing is done in exponential steps, ie, in the current implementation the table size is tripled when growing or divided by three when shrinking. Let's suppose, we insert `n' elements into an empty table. There will be `ld3 n' resizes, where `ld3' is denoting the logarithm to base 3. The size of the table during resizing will be 3^1, 3^2, 3^3, 3^4, etc., or in other words, the sum of the number of items touched during all resize operations together is \sum_{n=1}^{ld3 n} 3^n < 2 * n (This calculation neglects that hash table sizes are required to be prime numbers, but that won't change the qualitative result, I believe.) This means that doing all the intermediate resize operations is less than twice as much work as doing only a single resize in the end, or in other words, the last resize will dominate the overall cost anyway. Even g_hash_table_resize_for() won't help much. Compared to inserting `n' elements, resizing a table with `n' elements (the dominate cost of resizing) is relatively cheap. Resizing is done in a compact nested loop: new_nodes = g_new0 (GHashNode*, new_size); for (i = 0; i < hash_table->size; i++) for (node = hash_table->nodes[i]; node; node = next) { next = node->next; hash_val = (* hash_table->hash_func) (node->key) % new_size; node->next = new_nodes[hash_val]; new_nodes[hash_val] = node; } In contrast, insert requires a function call for each insert and allocates each node at a time. In fact resizing may be beneficial for performance, because all nodes are getting to be in a consecutive memory area, which will usually improve the cache hit rate when there are successive hash table operations. Manuel

**References**:**hash table freeze is useless***From:*Havoc Pennington

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]