Re: glib utf8 api



On Sun, 2008-03-02 at 14:49 -0800, Gregory Sharp wrote:
> Hi, I'm new to glib, and have questions/comments about
> the utf-8 API.

Hi Greg,

> 1) There seems to be no good way to strncpy a utf8 string 
> into a fixed buffer.  g_strncpy doesn't work, because the 
> last character can get truncated causing an invalid string.  
> g_utf8_strncpy doesn't work either, because I don't know 
> how many characters fit in the buffer.

Such an API would be useful, yes.  I opened a request for
g_utf8_strlcpy():

http://bugzilla.gnome.org/show_bug.cgi?id=520116

> 2) There seems to be no way to create a "best guess" valid
> string.  g_utf8_validate is nice and all, but if validation 
> fails I still need to create a valid string.  Am I supposed 
> to use g_convert_with_fallback() from UTF-8 to UTF-8?

Very good point.  I raised this here too:

http://bugzilla.gnome.org/show_bug.cgi?id=391261#c9

In Pango these days I loop over the string, calling g_utf8_validate()
and replacing any invalid bytes with -1.  The -1 byte is known to be
safe when passed to various glib UTF-8 functions.


> 3) If validated utf8 strings are fundamentally different from 
> unvalidated strings, shouldn't they use a different C type?

Not really, they are not.  Note that we don't use a type other than char
* for strings anyway.  C types don't buy you much safety...


> 4) What are the developers' reaction to camel_utf8_getc() 
> on this page: http://www.go-evolution.org/Camel.Misc

Dropping invalid input bytes is a horrible idea.  I think my suggestion
of outputting invalid-but-safe codepoints for invalid input bytes is a
better approach.


> Please tell me if I'm missing something.  I'm happy to 
> log bug reports as indicated.
> 
> Thanks,
> Greg

-- 
behdad
http://behdad.org/

"Those who would give up Essential Liberty to purchase a little
 Temporary Safety, deserve neither Liberty nor Safety."
        -- Benjamin Franklin, 1759



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]