Re: string return result conventions



On Mon, 15 Sep 2008, Luke Kenneth Casson Leighton wrote:

ok - in this situation, fortunately we have control over that.  the
property getter is entirely auto-generated.  the code review of the
new webkit glib/gobject bindings brought to light the webkit
convention of not imposing any "memory freeing" of e.g. strings on
users of the library.  use of refcounting is done on c++ objects, for
example.

the strings in webkit are unicode (libicu).  _at the moment_ i'm
alloc-sprintf'ing strings - all of them - into utf-8 return results.

it was recommended to me that i create a string pool system, to keep a
record of strings created, and, at convenient times, destroy them all
(reminds me of apache pools and samba talloc).  exactly when is
"convenient" is yet to be determined, which is the bit i'm not too
keen on :)

We've done a simialr thing in beast (beast.gtk.org), where we pile up
strings in a garbage collector pool and free them, once the topmost
main loop level is reached. This works fairly well in practice, it
however does have the following downsides:

- Strings can't be kept used across main loop invocation:
  foo = get_string();
  gtk_main();
  foo; // <- has been freed now

- Loops might build up extremely large temporary memory requirements:
  for (i = 0; i < 1000000; i++)
    gc_pool_add_string ("123456789");
  Be aware that this loop needs 10MB. A more sophisticated GC pool
  can be able to at least deal with:
  for (i = 0; i < 1000000; i++)
    { foo = get_string(); gc_pool_free_early (foo); }
  However that can still cause temporary bloat, if get_string() adds a
  string to the GC pool internally without early freeing.

- Recursive main loops block the GC pool:
  gtk_main();
    dispatch_handler1(); // adds strings to GC pool that need freeing
    gtk_main(); // recursive main loop blocks *any* GC pool freeing

So basically, there is no *convenient* time for string freeing in
C. Someone could always keep a pointer to a returned member around
somewhere. Because the event loop processing model most often recurses
down into handlers and comparatively quickly winds up the stack again,
the main loop coupled GC pool freeing tends to moderately work in
practice, but as shown above, it does come with severe potential
worst cases. Particularly the for() add_to_pool (pointer); case
is easily triggered implicitely.

clearly, the best overall thing would be to actually return the
unicode strings themselves rather than convert them (needlessly?) to
utf-8.

Strings handed out by the Gtk+ API and also strings passed in to Gtk+
API are/must be in UFT-8 format already, so there's no need for conversion
here.

if that's not possible to do, what would you recommend, in this situation?

many thanks,

l.


---
ciaoTJ


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]