Just a few UTF8 questions...



I have read the recommended page noted in the FAQ from the GTK online
documentation but still have a few questions:

As I understand it, GTK and GLIB are both written with UTF8 in mind.

If I use str = g_strdup_printf("mystring"); does this mean str is a valid
UTF8 string?

If that is the case, how is it that when I have two different strings (one
in German and the other in Russian), one is valid UTF8 and the other is not?
(as it happens, g_utf8_validate returns FALSE for the de_string).

        ru_string = "???? ???????";
        de_string = "Schöne Gzrüße";

If I use g_utf8_strlen(ru_string, -1), the length returned is 35 (strlen
returns the same value).  According to the documentation this is supposed to
return the length of characters.  Shouldn't it therefore return 11?

Also, if I read in from a socket to a gchar buffer[1024] and I then proceed
to print that information in the form 
        
        g_message("socket input: %*s", bytes, buffer);

Does the * represent how many characters or bytes that are printed from the
buffer?

I have a situation in my application where I print information coming in
from the socket and it causes a crash when receiving a stream filled with
Russian characters.  This is the only time I have experienced it so far.
Any ideas?

Regards,
Martyn




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]