Re: glib-2.2.1: g_print() with UTF-8 characters



Hi,

> > >  - Glibc has a "feature" where %Ns actually checks for a whole 
> > >    number of characters in the current encoding. So, unless you
> > >    are sure you are always going to be in an UTF-8 locale, avoid
> > >    using %Ns. (You are basically OK for iso-8859-1, but will
> > >    have problems in say, a Japanese locale.)
> > 
> > It would make sense for g_print() to interpret %Ns correcly since it
> > assumes that the strings passed to it always are encoded in UTF-8.  If
> > I could tell glibc that all strings passed to it will be encoded in
> > UTF-8 no matter what the current locale is, then that would solve my
> > problems.
> 
> the point is that the string passed from g_print() to glibc is not
> necessarily in UTF-8 encoding since g_print() does character set
> conversion to the charset specified by the locale.

The first call to glibc from g_print() is through vasprintf() and at
that point the string is not yet converted from UTF-8 to the locale
charset.  This is also where the %Ns is interpreted.  Before the
string is actually printed with fputs, it is converted to the current
locale charset however.

If I understand things correctly, since glibc checks for the number of
characters in current locale charset, converting the string to the
locale charset before calling vasprintf would make g_print() work
better.

Ulf



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]