Re: Algorithmic Complexity Attack on GLIB 2.2.1



For reference, the equivalent discussion on the Python development
mailing lists can be found in followups to:

http://mail.python.org/pipermail/python-dev/2003-May/035874.html

(Suggestion for your web page - provide links to mailing list
discussion for the various systems you list as "vulnerable"; for
starters, this thread starts from:
http://mail.gnome.org/archives/gtk-devel-list/2003-May/msg00111.html)

Almost all the comments I would make are brought up in Python
form there as well:

 - Only a small fraction of GTK+ applications have hash tables
   where this would be a concern. They either don't allow
   untrusted sources to feed them arbitrary numbers of strings
   to hash, or have other performance issues that would cause
   them to be "DOS'ed" in that situation even ignoring hash
   table performance problems.

 - String hashing speed is crucial to GTK+ operation; for example,
   in looking up signal and property names.

   In days past, we'd see numbers on the order of 5-10% of 
   run time; by using "quarks" (string table indices), it's 
   not quite that bad these days, but still frequently several
   percent of the run time.

 - I'm not sure your performance analysis is that applicable
   to GLib as used within GTK+.

    - The majority of keys hashed are short; Without real
      data, I'd guess the median key size is on the order
      of 10 characters. It seems that the speed of your
      implementation is largely based on being able to 
      process large chunks of data at once.

    - If hash table performance is a bottleneck for a GTK+
      app, it is almost certainly hashing the *same* strings
      repeatedly. So, they *will* be in L2 cache. L2 cache
      misses aren't relevant.

    - While the vast majority of GTK+ users *are* using
      Pentium-II compatible x86 processors, we wouldn't want
      to rely on x86/MMX/SSE assembly for speed. Plus there
      are some concerns to switching to MMX operation for 
      just a few characters, especially on older processors.
   
 - With the GLib interfaces, we do have the option to provide
   multiple string hash functions g_str_hash() and
   g_str_hash_secure() or even g_str_hash() and g_str_hash_fast()

 - It is possible that people are relying in their applications
   on consistent returns from g_str_hash(), though this
   is something that we don't support ... we changed
   the hash function between GLib-1.2.6 and GLib-1.2.7. Switching
   to a hash function that changed between runs would certainly
   enforce this! :-)

I don't want to reject changes here out of hand; one of our design
philosophies in GLib/GTK+ is that if a simple change can reduce
the number of things that application developers have to worry
about, that's something we want to do.

I think the groundwork that would have to be done before we could
look at what we wanted to change would be:

 - Instrument g_str_hash(), run a couple of large applications
   (say, Nautilus, GTK+-2.0-based-gnumeric), collect histograms
   of key size.

 - Write some medium sized benchmarks (say, construct a widget
   heirarchy, display it on the screen, destroy it), try replacing
   g_str_hash() to a universal hash and/or kernel-style Jenkins hash,
   and see what the effect is on performance.

Any volunteers?

Regards,
                                                    Owen





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]