RE: Just a few UTF8 questions...
- From: martyn 2 russell bt com
- To: maclas gmx de, gtk-app-devel-list gnome org
- Subject: RE: Just a few UTF8 questions...
- Date: Wed, 9 Jul 2003 09:45:47 +0100
Also, if I read in from a socket to a gchar buffer[1024] and I then
proceed to print that information in the form
g_message("socket input: %*s", bytes, buffer);
Does the * represent how many characters or bytes that are
printed from the
buffer?
There was a thread about this in gtk-list in March:
http://mail.gnome.org/archives/gtk-list/2003-March/msg00007.html
The answers were:
a) The way GLib uses UTF-8 together with printf has the
unfortunate effect
that the precision operates on bytes rather than characters.
b) Glibc has a "feature" where %Ns actually checks for a whole
number of characters in the current encoding. So, unless you
are sure you are always going to be in an UTF-8 locale, avoid
using %Ns. (You are basically OK for iso-8859-1, but will
have problems in say, a Japanese locale.)
If I receive information in from a GLIB IO Channel, it should be UTF8 right?
If what Owen says is true, as I understand it, printf uses * for the number
of bytes and GLIB's implementation uses it for the number of characters.
So if I receive a buffer filled with Russian characters, then my
buffer[1024] is FULL of multibyte characters. Using GLIB's implementation
means that I would be attempting to print 1024 characters when infact there
may only be 900. This would be why it is causing a crash, but never when
the information is in english. Do you agree?
So I can presume that printing WITHOUT the * would be the fix?
Regards,
Martyn
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]