RE: Just a few UTF8 questions...



Also, if I read in from a socket to a gchar buffer[1024] and I then 
proceed to print that information in the form 
    
    g_message("socket input: %*s", bytes, buffer);

Does the * represent how many characters or bytes that are 
printed from the
buffer?

There was a thread about this in gtk-list in March:

http://mail.gnome.org/archives/gtk-list/2003-March/msg00007.html

The answers were:

a) The way GLib uses UTF-8 together with printf has the 
unfortunate effect
   that the precision operates on bytes rather than characters.

b) Glibc has a "feature" where %Ns actually checks for a whole 
   number of characters in the current encoding. So, unless you
   are sure you are always going to be in an UTF-8 locale, avoid
   using %Ns. (You are basically OK for iso-8859-1, but will
   have problems in say, a Japanese locale.)

If I receive information in from a GLIB IO Channel, it should be UTF8 right?


If what Owen says is true, as I understand it, printf uses * for the number
of bytes and GLIB's implementation uses it for the number of characters.

So if I receive a buffer filled with Russian characters, then my
buffer[1024] is FULL of multibyte characters.  Using GLIB's implementation
means that I would be attempting to print 1024 characters when infact there
may only be 900.  This would be why it is causing a crash, but never when
the information is in english.  Do you agree?

So I can presume that printing WITHOUT the * would be the fix?

Regards,
Martyn




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]