Re: g_utf8_validate() and NUL characters
- From: Maciej Katafiasz <mathrick gmail com>
- To: gtk-devel-list gnome org
- Subject: Re: g_utf8_validate() and NUL characters
- Date: Thu, 9 Oct 2008 14:37:38 +0000 (UTC)
Den Wed, 08 Oct 2008 23:47:23 -0400 skrev Havoc Pennington:
> On Wed, Oct 8, 2008 at 11:00 PM, Behdad Esfahbod <behdad behdad org>
> wrote:
>> Lemme pull a real-world example: Last year I had to fix a bug in
>> Firefox where a page with a nul byte crashed the browser.
>
> What I don't see is how a nul byte is in any way different from an
> invalid sequence, other than being
> strictly-speaking-allowed-by-the-unicode-spec. If we care about the
> strictly speaking there, then we have to say gtk doesn't support utf8
> because we have nul-terminated string APIs. I don't think in practice
> the character 0 is useful, and I think doing the APIs with
> nul-termination was a correct decision.
No, no, no. NUL is invalid *only* because you assume it's special. That's
a tautology and as such can't or shouldn't be reasonably discussed with.
Non-data-transparent protocols with in-band control markers are a bad
idea anyway, they only came to be because of the PDP ASCIIZ instruction
family. IMHO, basing 2008 UI on PDP machine instruction set is
questionable engineering practice.
> The nul byte has the downside that, as we have been pointing out about
> the gtk stack, C programmers do *not* expect strings to have nul bytes
> in them.
>
> This is why nul is different from other nonprintable characters: that it
> breaks a bunch of C code, in practice. Nobody does anything special
> about the other nonprintables, but people treat nul as a special case
> all over the place.
So the stack our app uses can't handle NULs. That means the stack needs
fixing, not that my files are to be declared undisplayable, *especially*
if I can't even open and inspect said files in partially-gibberish mode
to see what exactly the problem is. "The platform does/assumes stupid
things" is not an excuse, we have a library dedicated to fixing such
things, it's called glib.
Cheers,
Maciej
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]