sprintf and utf8



Folks,
	Ok, I've been doing some code in the land of utf8/non-utf8 and
I've come across a problem.  Imagine you are printing a list of items
and item numbers:

--8<----------------------------------------------------------
1 bob
2 sue
3 jill
-->8----------------------------------------------------------

Furthur assume that you are in gtk or glib, and the list is utf8.

--8<----------------------------------------------------------
print_list(names[], count)
{
    for (i = 0; i < count; i++)
        fprintf(stdout, "%d %s\n", i, names[i]);
}
-->8----------------------------------------------------------

There is an obvious problem here.  names[i] is utf8, not the current
locale.

--8<----------------------------------------------------------
print_list(names[], count)
{
    for (i = 0; i < count; i++)
    {
        out = g_locale_from_utf8(names[i], -1, NULL, NULL, NULL);
        fprintf(stdout, "%d %s\n", i, out);
        g_free(out);
    }
}
-->8----------------------------------------------------------

Now the output is properly in the current locale.  Or is it?  What if
you are in a locale where " " is not ASCII 20?
	More directly, imagine you are using this code in GTK+:

--8<----------------------------------------------------------
    val = g_strdup_printf("%d) %s", i, names[i]);
    gtk_entry_set_text(entry, val);
-->8----------------------------------------------------------

ASCII " " is valid in utf8.  names[i] is also valid utf8.  BUT!
sprintf(3) works in the locale of the C library, and so your number
("%d") is NOT encoded in utf8, it is encoded in the current locale.  In
Latin-1 or ASCII this works great.  In other encodings you may get a
character in the string that is not valid utf8.  The entry might let it
slide, but pango probably won't.
	A quick look shows that gtk_progress_build_string is likely
succeptible to this.  Other widgets also.  GtkTreeView builds paths with

--8<----------------------------------------------------------
sprintf (ptr, ":%d", path->indices[i]);
-->8----------------------------------------------------------

in gtk_tree_path_to_string().  If GtkTreeView is expecting this to be
utf8 later, they are likely to be surprised.
	I was, in fact, doing some string munging along the lines of the
original example.  I was using GString when I thought of this issue.

--8<----------------------------------------------------------
    g_string_printf(str, " %2d ) %s\n", i, names[i]);
-->8----------------------------------------------------------

This will break because the number generated by %2d will be in the
current locale, the spaces and ')' will be in ASCII, and names[i] will
be in utf8.  When I later run g_locale_from_utf8() on this GString prior
to fprintfing it, I have a real issue.  A possible solution:

--8<----------------------------------------------------------
for (i = 0; i < count; i++)
{
    g_string_append(str, " ");  /* " " is valid utf8 */
    num = g_strdup_printf("%2d", i);
    conv = g_locale_to_utf8(num, -1, NULL, NULL, NULL);
    g_free(num);
    g_string_append(str, conv);
    g_free(conv);
    g_string_append_printf(str, " ) %s\n", names[i]);  /* again, valid utf8 */
}

conv = g_locale_from_utf8(str->str, -1, NULL, NULL, NULL);
fprintf(stdout, "%s", conv);
g_free(conv);
--8<----------------------------------------------------------

	Of course, maybe I missed something.  Maybe every encoding sees
" " as position 20.  Let me know.  If there are better solutions to my
quandry, I'm all ears.  At the very least, if this is an issue, we
should fix GTK+.

Joel

    
-- 

"Not everything that can be counted counts, and not everything
 that counts can be counted."
        - Albert Einstein 

			http://www.jlbec.org/
			jlbec evilplan org



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]