glib: gmain, perf, etc...



Hi all.  I originally sent the email below in June, but only just
noticed the bounce (doh!).  It should be mostly still relevant as a
set of constructive observations, so I'm sending it now, with a few
pencilled in updates.

FWIW, I've gone on to implement my own main loop, hash tables, linked
lists, and other structures.  Things now run about 5 times faster.
GLIB is just a poor fit for my application ;(

--------------
Hi all.  I'm using glib (1.3.3ish, IIRC) as the basis for some
embedded-style code, and I've hit a few snags.  Perhaps the collective
wisdom, or at least the collective Owen, can give me a pointer or two,
or maybe just discuss my experiences for whatever they're worth as an
example of weird non-gtk glib use.

 - Memchunks appear to be really slow.  They actually show up in the
   profiler, which is just alarming.  The memchunk code seems to be
   storing complicated trees of chunks of blocks, and rooting around
   in them for each allocation.

   Why aren't memchunks simple stacks of blocks?  Seems like that's
   all they would need to be, and then they'd be zippy.

   I see the trashstack, but it's not clear why the trashstack logic
   doesn't just get folded into memchunk.  Doing it separate means
   that there are a wider variety of itty-bitty interim internal
   structure nodes floating around than necessary, and it makes
   efficient usage take extra work.

   [ I've since made my own garbage-collecting pooled allocator.  All
   allocation is now totally out of my profile. ]

 - Several things in glib make Purify on solaris blow up for no
   reason.  This appears to be mostly a thing with gcc extensions and
   purify's (ancient, crappy) gcc support not getting along.

   Key among these are memchunks, which purify reports bogus
   uninitialized reads in the middle of, and g_log, which purify makes
   simply blow up, evidently because it miunderstands the varargs
   passthrough.  Turning chunks off with the debugging flags and
   stubbing g_log makes it go.

   On the bright side, Purify reported no actual badnesses in glib as
   used by my app.  Neither did debauch (which itself blows up now
   because my app is too big, I think) nor efence (which works very
   well).  Checker, which I'd love to actually use, is missing stubs
   for some i18n stuff referenced by glib.  I've got my tool guy
   poking at that; hopefully someday ;)

   I mangled various parts of glib to have a "disable freelists"
   config option, so that this purification is useful.  

   [ My patches would be horribly old at this point, but the changes
   were minor. ]

 - I read Sebastian's (*) patch submission and discussion closely.  He
   (sort of) adds a feature present in a couple other loops (most
   notably the NBmumble thing in the OSX/*Step Foundation): the
   ability to somehow "group" things and recurse the loop for various
   subsets of event handlers.  (It adds additional thread-related
   features, but some of the LinuxThreads pthread primitives -
   especially blocking on semaphores and conditions as done by
   GAsyncQueue et al - are so painfully slow that I gave up on threads
   long ago.  They seem to use plain old signals inside; urgh.)

   I need a loop that does this because we run our entire distributed
   message-passing system in a single process for debugging purposes.
   When anyone sends a message, I run a recursive loop, but with only
   the event handlers for the destination entity and the
   message-passer attached.  When the response message appears, I quit
   the recursed loop.

   The trick is that gmain is arranged to operate a single state
   machine: either event handlers are reentrant, or they aren't, and
   there's no grouping or masking features to alow finer control of
   state machine scheduling.  I added a g_source_detach, which I use
   to nondestructively detach all sources I need masked, but this is
   quite ugly.  (If anyone wants it, just ask, but like I said it's
   ugly - both as a thing to do and as code).

   Anywho, long story short, is any form of that work likely to appear
   in the 1.3.x line?  Is there some other technique I could use to
   nonpainfully do this atop gmain?  If so it might save me writing my
   own event system.

 - I'm having performance headaches with gmain.  It appears not to
   treat timeout sources specially, which means that if you add many,
   it gets really slow while it iterates the list every check,
   prepare, or dispatch.  Seems like it should have some special
   timeout treatment and keep timeouts separate and sorted, rather
   than iterating over the list every time.

   On the fd side, there does seem to be some logic to cache poll
   sets, but it's not clear that this actually eliminates as many
   query/check/etc "all sources" iterations as it should.

   My test program (which "pings" an event back and forth inside
   itself) slows down by 50 times when 1000 no-work random timeout
   sources are added.  That's admittedly a lot of timeouts, but it
   would be at least twice as fast if the timeouts were sorted and
   treated a bit specially.

   Are there any things one should or shouldn't do when using a gmain
   to help it perform a bit more evenly?  As is I have timeouts coming
   and going somewhat frequently, which is poor, but normally there
   aren't as fatally large a number of them as in my tests.

   [ I've since written my own event loop.  Sorting timers reduces
   timer work to one partial iteration per timer firing, and having
   the extra state machine knowledge in the main loop (probably
   specific to my application) means that most of the likely pollsets
   can be kept around precomputed. ]

 - It would be useful to have in-place constructors for some of the
   objects in the glib library.  For example, if I were to do this:

   struct foo {
     int i;
     GQueue q;
   }

   g_queue_inplace_new(&q);

   ...and end up with a valid GQueue smack in my struct.  With
   judicious use internal to glib, some glib things might even
   benefit.  I'll maybe do some of this for any glib structures I can
   continue using after I get my event loop straightened out and have
   profiled the result.

   [ Most of my own structures operate in a non-allocating way;
   obviously both have a place, but the glib ones tend to promote the
   container variety more than they should. ]

 - A skiplist would be very handy.  I know that Havoc turned down one
   skiplist submission, which probably made sense given that that
   implementation was a funky doubly-linked skiplist atop plain
   malloc.  Even so, skiplists are way better than trees for a lot of
   things, and seem like a good thing to have in glib.

   [ Skiplists are perfect for timers in event loops, btw.  You can
   totally eliminate list iteration in the loop by using one. ]

 - The g_log subsystem is frightfully slow, even for things that are
   filtered out.  It runs vsprintf on the message *before* filtering.
   When something is filtered, it should be zero or at most one
   function calls, not a whole pile of formatting logic.

 - g_malloc, etc would be nicer as macros.  As is, many debugging
   allocators, and glibc's malloc hooks, return as the caller g_malloc
   for pretty much everything.  It's not clear that glib's built-in
   heap debug/profile features can replace things like dmalloc or the
   glibc malloc hooks.

 - Is there a way to control the size of hash tables better?  One
   should be able to define a fixed size up front and avoid all sorts
   of dynamic resize costs.


* It's actually a small world; we evaluated OrbitMT, too, for our
  outward-facing orb, but alas we've apparently ended up using a
  nonfree one ;(

-- 
Grant Taylor - gtaylor<at>picante.com - http://www.picante.com/~gtaylor/
  Linux Printing Website and HOWTO:  http://www.linuxprinting.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]