Re: GLib String support



Am Montag, den 11.08.2008, 12:53 +0100 schrieb adrian.dmc:
> Hmm... Early validation of our strings, not that stupid. Here some
> starting port for brain storming:
>
> struct _GString
> {
> gchar *str;
> gsize len;
> gsize allocated_len;
> gboolean tainted : 1;
> };
>
> Let's add the important g_utf8_ functions to g_string, and let them
> operate without validation when tainted is false.

If something like that were done, would it be beyond the realms of sanity to use the remainning bits in the flag field as a refcount?  Everything I do in GTK seems to involve a nearly insane amount of string copying.

In the past, the idea of adding an extra field has seemed like it would make the situation worse, not better.  The idea has been raised and beaten down on numerous occasions, for very good reasons.

But this looks like not only a very nice idea in its own right, but also the perfect opportunity; swipe the top bit of a reference count field as a tainted flag (and perhaps reserve another couple bits for future use, a 24-bit refcount should be sufficient...?), and apply the well-known "copy on write unless unshared" semantics that you see in a number of other situations.  Then, make it the standard method of carrying strings throughout GTK.

To round out the upgrade, indicate string literals with an allocated_len of -1, to avoid having to copy them until its absolutely necessary, perhaps with a _() style macro to maximise sharing of a single GString instance.


This has been a dream of mine for quite a while, to the point where I even started to work up my own ZString wrappers for GTK functions, where ZString is simply a refcounted GString.  I mostly aborted that project, though, because it really didn't make a lot of difference without massive changes within the GTK code itself to carry the ZString's through GTK's internals.  I do however, use my ZString's as far as possible throughout some of my own pet GTK toys.  The lack of refcounting has also virtually stopped me from using GString's, because without either refcounts or functionality such as this, there's really stuff-all reason to use them at all, unless you're dealing with small binary data blobs.  Even with larger binary data blobs, you're more likely to be using preallocated and/or blocked buffers anyhow.

Oh, one other thing I've been wondering, does GTK's utf-8 implementation make allowances for embedded nulls within strings?  It seems to be fairly common, even recommended practice in some places (a howto I read a while back on writing bug-free programs even recommended the practice), to use a multi-byte representation of \0 within strings, and save the plain \0 strictly for string termination.  I have to admit I've mostly avoided utf-8 encoding issues because I wasn't sure how some of these edge cases are handled.


Fredderic


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]