Re: g_utf8_validate() and NUL characters



Havoc Pennington wrote:
> Hi,
> 
> On Mon, Oct 6, 2008 at 4:42 PM, coda <coda trigger gmail com> wrote:
>> As a result of all
>> this, gedit's inability to edit "binary files" is simply an inability to edit a
>> file with a NUL byte in it.
> 
> That doesn't seem true; a binary file could be invalid UTF-8 (or
> whatever encoding) in thousands of ways besides embedded nul, no?

I think coda mentioned that they interpret binary data as Latin1 and convert
to UTF-8.  That "works", but makes nonsense text out of your binary data.

>> g_utf8_validate() could simply be fixed to accept NUL characters, but functions
>> that return a gchar* with no length output parameter, like
>> gtk_text_buffer_get_text(), would require replacements.
> 
> I think you'd find that GtkTextView breaks in some fairly deep ways,
> though maybe not.

I've almost made Pango NUL-safe.  Have not tested it extensively, but it's
definitely a goal of mine to do.  It already accepts NUL bytes in
pango_layout_set_text with a positive length argument.

>> Is there any reason not to support NUL/U+0000 in strings?
> 
> The point of not allowing nul in g_utf8_validate() I think is that nul
> is not valid text. It may be valid unicode in some technical sense,
> but it isn't text, in the same sense that malformed utf8 isn't text.

What's so weird about NUL bytes Havoc?  There is text, and there is
nul-terminated text.  In the former, a NUL byte is as valid as any other valid
UTF-8 character.

> Havoc


behdad



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]