Re: g_utf8_validate() and NUL characters

Havoc Pennington wrote:
> Hi,
> On Thu, Oct 9, 2008 at 8:46 PM, Behdad Esfahbod <behdad behdad org> wrote:
>> nul is invalid *just* because you declared it so.
> I guess my whole claim is that it's useful/better to declare it
> invalid in contexts where you really want text, as opposed to an
> arbitrary binary stream. I don't see why it's useful to declare it
> valid. I don't see the practical importance of the utf8 spec as the
> arbiter of validity here.

You argue that nul is too much trouble for too little gain, so lets not
support it.  I argue that nul happens and programmers have no control over
what's coming in, so it's less trouble for the users of glib API if we support it.

And those two views do not necessarily conflict.  Indeed a length parameter
with "-1 means nul-terminated" can accommodate both.  The faint of heart can
always use -1 and forget about the alternate usage.

> I see why it's useful to have some codepaths that handle binary
> garbage (an arbitrary stream), I see why it's useful to have a
> codepath that handles non-nul utf8, I don't see why it's useful to
> handle utf8-including-nul because all the use-cases I can come up with
> would equally apply to arbitrary binary data.
> What is the example where you want to allow utf8-including-nul that
> would not equally argue for handling arbitrary binary data?

Dealing with data coming from a valid XML/HTML/... file.  I believe user
should be able to let those kind of text flow inside their program without
having to go into the trouble of dealing with theoretical errors at every
conversion point, looping over, etc.  In your view of the problem, such text
is not valid because it may contain nul bytes, so the programmer has to code
much more defensively.

> Havoc


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]