Re: g_utf8_validate() and NUL characters
- From: Behdad Esfahbod <behdad behdad org>
- To: Havoc Pennington <hp pobox com>
- Cc: gtk-devel-list gnome org
- Subject: Re: g_utf8_validate() and NUL characters
- Date: Thu, 09 Oct 2008 20:46:12 -0400
Havoc Pennington wrote:
> Hi,
>
> On Wed, Oct 8, 2008 at 11:00 PM, Behdad Esfahbod <behdad behdad org> wrote:
>> Lemme pull a real-world example: Last year I had to fix a bug in Firefox where
>> a page with a nul byte crashed the browser.
>
> What I don't see is how a nul byte is in any way different from an
> invalid sequence,
nul is invalid *just* because you declared it so.
> other than being
> strictly-speaking-allowed-by-the-unicode-spec. If we care about the
> strictly speaking there, then we have to say gtk doesn't support utf8
> because we have nul-terminated string APIs. I don't think in practice
> the character 0 is useful, and I think doing the APIs with
> nul-termination was a correct decision.
We already have some API that does not assume nul-termination with a positive
length.
>> Why? Because Firefox did Unicode
>> validation on input, but then tried to convert UTF-16 to UTF-8 using glib and
>> pass it on to GTK+/Pango function. Somewhere along the lines the nul byte was
>> playing bad... That's the sort of problems being stricter than the standard
>> causes.
>
> But let's turn this around. If Firefox had used g_utf8_validate()
> semantics (or g_convert_with_fallback() semantics) to validate input,
> nothing would have crashed. If anything this seems like an example of
> failing to disallow nul causing crashes.
That's like saying: "we borked interoperability, so lets convert everyone to
glib."
> I bet nul bytes in firefox still break in more obscure cases, too,
> despite fixing this bug. Pretty sure Firefox converts its strings to
> nul-terminated C strings from time to time as it uses third party
> libraries and such.
Ain't gonna prove you wrong on this one :).
>> As a user all I care is that 1) my browser/editor doesn't crash, 2) it shows
>> me something when I ask it to open a file.
>
> I would say allowing one specific kind of invalid file (one with a nul
> byte) does not make sense, unless you're going to open *any* file.
We disagree on whether nul is invalid to begin with. That said,
pango_layout_set_text() indeed accepts any junk you throw at it, because I
found it useful to not be picky on input the programmer has not much control
over anyway.
http://www.pango.org/ScriptGallery/
It's kinda the same philosophy that makes UI applications do not handle memory
allocation failure. What's a programmer to do when text is invalid?
behdad
> And
> g_utf8_validate() doesn't even make sense then. Then you need
> g_convert_with_fallback(), or a hex editor, or something. nul byte is
> *one of infinite ways* a file can be impossible to edit in a text
> editor.
>
> If you care about not crashing and showing the user something for any
> file, then you need to talk about random binary garbage, not about nul
> bytes. g_utf8_validate() becomes irrelevant. g_utf8_validate() is only
> relevant when you're going to show *text*, not when you want to show
> an *arbitrary byte stream*.
>
> nul bytes may be valid unicode, but they are not valid text. Or at
> least not *useful* text.
>
> I also would say that allowing nul bytes to unexpectedly float through
> apps is most likely going to create more crashes than it fixes. But, I
> suppose reasonable people could disagree. I have certain written tons
> and tons of code that does not work on strings with nul bytes in them,
> though.
>
> But my basic claim is that to get 1) my browser/editor doesn't crash,
> 2) it shows me something when I ask it to open a file, what you want
> is to load arbitrary junk, not just text files with one specific
> oddity (nul bytes).
>
>>> As a side issue, I think in most cases programs likely break if they
>>> load a non-nul-terminated string, so it's convenient if
>>> g_utf8_validate() is catching that.
>> I don't agree. I have made Pango cleanly handle nul bytes. That's not
>> impossible, just bugs here and there.
>
> I didn't say it was impossible, I said there would be bugs here and there ;-)
>
> And in fact we have the proof, in gtk there are bugs here and there.
> Otherwise we wouldn't even have this thread.
>
> I'd say most existing app code, and newly-written app code, will have
> bugs here and there until and unless the programmer explicitly
> considers this issue and tests it. And few will.
>
> Havoc
>
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]