[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: Just a few UTF8 questions...



> > Also, if I read in from a socket to a gchar buffer[1024] and I then 
> > proceed to print that information in the form 
> > 	
> >	g_message("socket input: %*s", bytes, buffer);
> >
> > Does the * represent how many characters or bytes that are 
> printed from the
> > buffer?
> 
> There was a thread about this in gtk-list in March:
> 
> http://mail.gnome.org/archives/gtk-list/2003-March/msg00007.html
> 
> The answers were:
> 
> a) The way GLib uses UTF-8 together with printf has the 
> unfortunate effect
>    that the precision operates on bytes rather than characters.
> 
> b) Glibc has a "feature" where %Ns actually checks for a whole 
>    number of characters in the current encoding. So, unless you
>    are sure you are always going to be in an UTF-8 locale, avoid
>    using %Ns. (You are basically OK for iso-8859-1, but will
>    have problems in say, a Japanese locale.)

If I receive information in from a GLIB IO Channel, it should be UTF8 right?


If what Owen says is true, as I understand it, printf uses * for the number
of bytes and GLIB's implementation uses it for the number of characters.

So if I receive a buffer filled with Russian characters, then my
buffer[1024] is FULL of multibyte characters.  Using GLIB's implementation
means that I would be attempting to print 1024 characters when infact there
may only be 900.  This would be why it is causing a crash, but never when
the information is in english.  Do you agree?

So I can presume that printing WITHOUT the * would be the fix?

Regards,
Martyn




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]