RE: Just a few UTF8 questions...
- From: Matthias Clasen <maclas gmx de>
- To: martyn 2 russell bt com
- Cc: gtk-app-devel-list gnome org
- Subject: RE: Just a few UTF8 questions...
- Date: Wed, 9 Jul 2003 11:32:58 +0200 (MEST)
Also, if I read in from a socket to a gchar buffer[1024] and I then
proceed to print that information in the form
g_message("socket input: %*s", bytes, buffer);
Does the * represent how many characters or bytes that are
printed from the
buffer?
There was a thread about this in gtk-list in March:
http://mail.gnome.org/archives/gtk-list/2003-March/msg00007.html
The answers were:
a) The way GLib uses UTF-8 together with printf has the
unfortunate effect
that the precision operates on bytes rather than characters.
b) Glibc has a "feature" where %Ns actually checks for a whole
number of characters in the current encoding. So, unless you
are sure you are always going to be in an UTF-8 locale, avoid
using %Ns. (You are basically OK for iso-8859-1, but will
have problems in say, a Japanese locale.)
If I receive information in from a GLIB IO Channel, it should be UTF8
right?
If what Owen says is true, as I understand it, printf uses * for the
number
of bytes and GLIB's implementation uses it for the number of characters.
No. Owen speaks about glibc, and the precision is always the number of bytes
(unless you use wprintf and wide characters). The feature Owen means is
that
glibc checks that the bytes to be printed form a valid sequence of
characters in
the encoding of the selected locale (ie that the byte array doesn't end in
the middle
of a multibyte character).
So if I receive a buffer filled with Russian characters, then my
buffer[1024] is FULL of multibyte characters. Using GLIB's implementation
means that I would be attempting to print 1024 characters when infact
there
may only be 900. This would be why it is causing a crash, but never when
the information is in english. Do you agree?
io channels in fact return utf-8. For the rest, see above.
So I can presume that printing WITHOUT the * would be the fix?
The simplest solution would certainly be to nul-terminate the byte array and
omit the
precision.
Matthias
--
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
Jetzt ein- oder umsteigen und USB-Speicheruhr als Prämie sichern!
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]