Re: g_utf8_validate() and NUL characters

From: Maciej Katafiasz <mathrick gmail com>
To: gtk-devel-list gnome org
Subject: Re: g_utf8_validate() and NUL characters
Date: Thu, 9 Oct 2008 14:37:38 +0000 (UTC)

Den Wed, 08 Oct 2008 23:47:23 -0400 skrev Havoc Pennington:

> On Wed, Oct 8, 2008 at 11:00 PM, Behdad Esfahbod <behdad behdad org>
> wrote:
>> Lemme pull a real-world example: Last year I had to fix a bug in
>> Firefox where a page with a nul byte crashed the browser.
> 
> What I don't see is how a nul byte is in any way different from an
> invalid sequence, other than being
> strictly-speaking-allowed-by-the-unicode-spec. If we care about the
> strictly speaking there, then we have to say gtk doesn't support utf8
> because we have nul-terminated string APIs. I don't think in practice
> the character 0 is useful, and I think doing the APIs with
> nul-termination was a correct decision.

No, no, no. NUL is invalid *only* because you assume it's special. That's 
a tautology and as such can't or shouldn't be reasonably discussed with. 
Non-data-transparent protocols with in-band control markers are a bad 
idea anyway, they only came to be because of the PDP ASCIIZ instruction 
family. IMHO, basing 2008 UI on PDP machine instruction set is 
questionable engineering practice.

> The nul byte has the downside that, as we have been pointing out about
> the gtk stack, C programmers do *not* expect strings to have nul bytes
> in them.
> 
> This is why nul is different from other nonprintable characters: that it
> breaks a bunch of C code, in practice. Nobody does anything special
> about the other nonprintables, but people treat nul as a special case
> all over the place.

So the stack our app uses can't handle NULs. That means the stack needs 
fixing, not that my files are to be declared undisplayable, *especially* 
if I can't even open and inspect said files in partially-gibberish mode 
to see what exactly the problem is. "The platform does/assumes stupid 
things" is not an excuse, we have a library dedicated to fixing such 
things, it's called glib.

Cheers,
Maciej

Follow-Ups:
- Re: g_utf8_validate() and NUL characters
  - From: Matthias Clasen

References:
- =?utf-8?b?Z191dGY4X3ZhbGlkYXRlKCk=?= and NUL characters
  - From: coda
- Re: g_utf8_validate() and NUL characters
  - From: Havoc Pennington
- Re: g_utf8_validate() and NUL characters
  - From: Behdad Esfahbod
- Re: g_utf8_validate() and NUL characters
  - From: Brian J. Tarricone
- Re: g_utf8_validate() and NUL characters
  - From: Havoc Pennington
- Re: g_utf8_validate() and NUL characters
  - From: Behdad Esfahbod
- Re: g_utf8_validate() and NUL characters
  - From: Havoc Pennington

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]