Re: gtk bug or glibc locale bug?




Changwoo Ryu <cwryu@adam.kaist.ac.kr> writes:

> Owen Taylor <otaylor@redhat.com> writes:
> 
> > > How about (MB_CUR_MAX >= 3) after the "\xc0" check?  It seems OK to
> > > me.  Is there any 1-byte locale whose MB_CUR_MAX >= 3, or multibyte
> > > locale whose MB_CUR_MAX < 3 ?
> > > 
> > > ----------------------------------------------------------------------
> > >        setlocale (LC_CTYPE, "C");
> > >        gtk_use_mb = (mblen ("\xc0", MB_CUR_MAX) == 1);
> > >        setlocale (LC_CTYPE, current_locale);
> > > +      if (! gtk_use_mb && (MB_CUR_MAX >= 3))
> > > +        gtk_use_mb = TRUE;
> > >      }
> > >  
> > >    g_free (current_locale);
> > > ----------------------------------------------------------------------
> > > 
> > > Please comment about this.
> > 
> > Hmmm, this is somewhat ugly, since a conformant C library
> > could report MB_CUR_MAX as 1024 always and not handle
> > multibyte characters at all, thought it would work
> > on all machines I know of currently.
> 
> Mm..  But the \xc0 check is also ugly, isn't it?

True, no doubt about that.
 
> If "C" locale is same as US-ASCII, mblen() result can be -1.  

I'm sort of counting on the laziness of C library writers
here; it seems doubtful that they would have separate mb*
functions for 7-bit and 8-bit locales. The check really
should set the locale to "en_US" or something, but I didn't
want to rely on the existance of another locale.

> And C
> library can pass the \xc0 check and not handle mb* functions at all.
>
> Why does the current code set gtk_use_mb value by "C" locale, not by
> current locale?  I think it is a bug to be fixed.

The question isn't whether the C library handle multi-byte locales,
the question is whether it will, for single byte locales, correctly
report a length of 1. 

GTK+ is currently completely innocent of all knowledge of what
locales are single-byte or multi-byte, so all it can do
is check in the one locale that it knows is single-byte, the
"C" locale. 

If the current locale was encoded in, e.g., EUC-jp, a result of -1 for
mblen("\xc0") would be perfectly correct, so checking in the
current locale doesn't work.

Regards,
                                        Owen

 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]