sprintf and utf8
- From: Joel Becker <jlbec evilplan org>
- To: gtk-devel-list gnome org
- Subject: sprintf and utf8
- Date: Sat, 30 Mar 2002 02:41:13 +0000
Folks,
Ok, I've been doing some code in the land of utf8/non-utf8 and
I've come across a problem. Imagine you are printing a list of items
and item numbers:
--8<----------------------------------------------------------
1 bob
2 sue
3 jill
-->8----------------------------------------------------------
Furthur assume that you are in gtk or glib, and the list is utf8.
--8<----------------------------------------------------------
print_list(names[], count)
{
for (i = 0; i < count; i++)
fprintf(stdout, "%d %s\n", i, names[i]);
}
-->8----------------------------------------------------------
There is an obvious problem here. names[i] is utf8, not the current
locale.
--8<----------------------------------------------------------
print_list(names[], count)
{
for (i = 0; i < count; i++)
{
out = g_locale_from_utf8(names[i], -1, NULL, NULL, NULL);
fprintf(stdout, "%d %s\n", i, out);
g_free(out);
}
}
-->8----------------------------------------------------------
Now the output is properly in the current locale. Or is it? What if
you are in a locale where " " is not ASCII 20?
More directly, imagine you are using this code in GTK+:
--8<----------------------------------------------------------
val = g_strdup_printf("%d) %s", i, names[i]);
gtk_entry_set_text(entry, val);
-->8----------------------------------------------------------
ASCII " " is valid in utf8. names[i] is also valid utf8. BUT!
sprintf(3) works in the locale of the C library, and so your number
("%d") is NOT encoded in utf8, it is encoded in the current locale. In
Latin-1 or ASCII this works great. In other encodings you may get a
character in the string that is not valid utf8. The entry might let it
slide, but pango probably won't.
A quick look shows that gtk_progress_build_string is likely
succeptible to this. Other widgets also. GtkTreeView builds paths with
--8<----------------------------------------------------------
sprintf (ptr, ":%d", path->indices[i]);
-->8----------------------------------------------------------
in gtk_tree_path_to_string(). If GtkTreeView is expecting this to be
utf8 later, they are likely to be surprised.
I was, in fact, doing some string munging along the lines of the
original example. I was using GString when I thought of this issue.
--8<----------------------------------------------------------
g_string_printf(str, " %2d ) %s\n", i, names[i]);
-->8----------------------------------------------------------
This will break because the number generated by %2d will be in the
current locale, the spaces and ')' will be in ASCII, and names[i] will
be in utf8. When I later run g_locale_from_utf8() on this GString prior
to fprintfing it, I have a real issue. A possible solution:
--8<----------------------------------------------------------
for (i = 0; i < count; i++)
{
g_string_append(str, " "); /* " " is valid utf8 */
num = g_strdup_printf("%2d", i);
conv = g_locale_to_utf8(num, -1, NULL, NULL, NULL);
g_free(num);
g_string_append(str, conv);
g_free(conv);
g_string_append_printf(str, " ) %s\n", names[i]); /* again, valid utf8 */
}
conv = g_locale_from_utf8(str->str, -1, NULL, NULL, NULL);
fprintf(stdout, "%s", conv);
g_free(conv);
--8<----------------------------------------------------------
Of course, maybe I missed something. Maybe every encoding sees
" " as position 20. Let me know. If there are better solutions to my
quandry, I'm all ears. At the very least, if this is an issue, we
should fix GTK+.
Joel
--
"Not everything that can be counted counts, and not everything
that counts can be counted."
- Albert Einstein
http://www.jlbec.org/
jlbec evilplan org
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]