Re: window caption and utf

Matthias Ettrich <> writes:

> > > 
> > I suspect alot of places where X uses "ascii" or ISO-LATIN-1 could be 
> > safely replaced with UTF-8 without bad things happening.  UTF-16 is a 
> > very different animal, and I believe is not useful for this purpose.  
> > In any case, I expect looking into where UTF-8 can be used should be looked 
> > into carefully; it may provide alot of bang for the buck...
> The problem with "ascii" is, that everybody has a different
> understanding what ascii was supposed to be. I understand that the
> japanese users, just as an example, think of their national 7 bit
> encoding as ascii as well, and I assume their applications will use
> it to set the window captions.  Interpretting those as utf8 now
> might be a bit rude.

The ICCCM defines very carefully how text strings are
interpreted. Two standard types are defined.

 STRING        - Latin-1 + TAB and NEWLINE
 COMPOUND_TEXT - Basically iso-2022; a shift based encoding for
                 multiple character sets. (But defined more
                 exactly in an X spec).

Randomly defining STRING to mean UTF8 would be 
simply wrong. Possibilities include:

 - Just using COMPOUND TEXT as currently and converting
   unicode into national character sets for transport.

 - Using COMPOUND_TEXT but using an escape sequence for
   UTF-8. This is neither backwards compatible.

Neither of these works well, because a lot of code assumes
that the only thing that will ever be in a COMPOUND_TEXT
string is the character set for the currently active
locale; plus stateful encodings like iso-2022 are
a real pain to work with.

- Extending the ICCCM to allow an additional type
   in these contexts that indicates a UTF-8 strings.
   There is a draft spec for such a thing at:

I consider the last probably the best alternative; though
there are some problems with backwards compatibility.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]