Re: Algorithmic Complexity Attack on GLIB 2.2.1
- From: Owen Taylor <otaylor redhat com>
- To: Scott A Crosby <scrosby cs rice edu>
- Cc: gtk-devel-list gnome org
- Subject: Re: Algorithmic Complexity Attack on GLIB 2.2.1
- Date: 31 May 2003 13:31:03 -0400
For reference, the equivalent discussion on the Python development
mailing lists can be found in followups to:
(Suggestion for your web page - provide links to mailing list
discussion for the various systems you list as "vulnerable"; for
starters, this thread starts from:
Almost all the comments I would make are brought up in Python
form there as well:
- Only a small fraction of GTK+ applications have hash tables
where this would be a concern. They either don't allow
untrusted sources to feed them arbitrary numbers of strings
to hash, or have other performance issues that would cause
them to be "DOS'ed" in that situation even ignoring hash
table performance problems.
- String hashing speed is crucial to GTK+ operation; for example,
in looking up signal and property names.
In days past, we'd see numbers on the order of 5-10% of
run time; by using "quarks" (string table indices), it's
not quite that bad these days, but still frequently several
percent of the run time.
- I'm not sure your performance analysis is that applicable
to GLib as used within GTK+.
- The majority of keys hashed are short; Without real
data, I'd guess the median key size is on the order
of 10 characters. It seems that the speed of your
implementation is largely based on being able to
process large chunks of data at once.
- If hash table performance is a bottleneck for a GTK+
app, it is almost certainly hashing the *same* strings
repeatedly. So, they *will* be in L2 cache. L2 cache
misses aren't relevant.
- While the vast majority of GTK+ users *are* using
Pentium-II compatible x86 processors, we wouldn't want
to rely on x86/MMX/SSE assembly for speed. Plus there
are some concerns to switching to MMX operation for
just a few characters, especially on older processors.
- With the GLib interfaces, we do have the option to provide
multiple string hash functions g_str_hash() and
g_str_hash_secure() or even g_str_hash() and g_str_hash_fast()
- It is possible that people are relying in their applications
on consistent returns from g_str_hash(), though this
is something that we don't support ... we changed
the hash function between GLib-1.2.6 and GLib-1.2.7. Switching
to a hash function that changed between runs would certainly
enforce this! :-)
I don't want to reject changes here out of hand; one of our design
philosophies in GLib/GTK+ is that if a simple change can reduce
the number of things that application developers have to worry
about, that's something we want to do.
I think the groundwork that would have to be done before we could
look at what we wanted to change would be:
- Instrument g_str_hash(), run a couple of large applications
(say, Nautilus, GTK+-2.0-based-gnumeric), collect histograms
of key size.
- Write some medium sized benchmarks (say, construct a widget
heirarchy, display it on the screen, destroy it), try replacing
g_str_hash() to a universal hash and/or kernel-style Jenkins hash,
and see what the effect is on performance.
] [Thread Prev