Re: g_utf8_validate() and NUL characters
- From: Behdad Esfahbod <behdad behdad org>
- To: Havoc Pennington <hp pobox com>
- Cc: gtk-devel-list gnome org, coda <coda trigger gmail com>
- Subject: Re: g_utf8_validate() and NUL characters
- Date: Tue, 07 Oct 2008 16:55:37 -0400
Havoc Pennington wrote:
> Hi,
>
> On Mon, Oct 6, 2008 at 4:42 PM, coda <coda trigger gmail com> wrote:
>> As a result of all
>> this, gedit's inability to edit "binary files" is simply an inability to edit a
>> file with a NUL byte in it.
>
> That doesn't seem true; a binary file could be invalid UTF-8 (or
> whatever encoding) in thousands of ways besides embedded nul, no?
I think coda mentioned that they interpret binary data as Latin1 and convert
to UTF-8. That "works", but makes nonsense text out of your binary data.
>> g_utf8_validate() could simply be fixed to accept NUL characters, but functions
>> that return a gchar* with no length output parameter, like
>> gtk_text_buffer_get_text(), would require replacements.
>
> I think you'd find that GtkTextView breaks in some fairly deep ways,
> though maybe not.
I've almost made Pango NUL-safe. Have not tested it extensively, but it's
definitely a goal of mine to do. It already accepts NUL bytes in
pango_layout_set_text with a positive length argument.
>> Is there any reason not to support NUL/U+0000 in strings?
>
> The point of not allowing nul in g_utf8_validate() I think is that nul
> is not valid text. It may be valid unicode in some technical sense,
> but it isn't text, in the same sense that malformed utf8 isn't text.
What's so weird about NUL bytes Havoc? There is text, and there is
nul-terminated text. In the former, a NUL byte is as valid as any other valid
UTF-8 character.
> Havoc
behdad
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]