Re: (forw) Unicode/UTF-8 in GTK




> ----- Forwarded message from Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk> -----

> To: petm@xcf.berkeley.edu, spencer@xcf.berkeley.edu, jmacd@xcf.berkeley.edu
> Subject: Unicode/UTF-8 in GTK
> Date: Sat, 06 Feb 1999 17:05:57 +0000
> 
> Have you any plans to support Unicode strings in GTK?

Unicode (and associated improvements in internationalization)
will be the primary focus of the next development cycle of 
GTK+. [ Along with, most likely, the integration of the 
Win32 port ]
 
> There are now several decent X11 "*-ISO10646-1" (Unicode) fonts
> available, for instance on
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html
> 
> I have extended the 6x13 xterm default fixed font to a repertoire
> of 2800 Unicode characters, including all Latin, Greek, Cyrillic,
> Phonetic Alphabet, and mathematical characters.
> 
> I am currently working on extending other X11 fonts to a decent
> Unicode repertoire as well. If you are interested in making GTK
> Unicode capable, I'd be happy to work on extending those fonts with
> highest priority that you consider to be most important for GTK
> users.
>
> The X11 "*-iso10646-1" Unicode fonts are 16-bit fonts. The characters
> 0x0000 - 0x00ff follow the ISO 8859-1 standard. Unicode strings are best
> represented under Unix either as a wchar_t array using a 16-bit value
> per character, or as a char array in the UTF-8 encoding, using a 1-byte,
> 2-byte, or 3-byte character sequence for every character. The 1 byte
> sequences in the UTF-8 encoding are exactly the 7-bit ASCII characters,
> so that UTF-8 files are strictly ASCII backwards compatible.

While creating Unicode fonts is important work, I don't think
we will be relying exclusively on such fonts. To provide users
with the full choice of fonts they have currently. (CJK users
will probably be unhappy with 6x13 fixed, western users may
want fonts that haven't been extended to full unicode), it 
is likely we will be mapping unicode into multiple X fonts.

That is, one might conceivably have:

 <iso-8859 characters>   =>   lucidasanstypewriter-14
 <cjk characters>        =>   jis-fixed-...
 <tibetan characters>    =>   misc-fixed-*-iso10646-1
 
> See "man utf-8" and "man unicode" for details.
> 
> glibc 2.1 will implement all the new ISO C Amendment 1 functions
> such as wprintf() and mb2wc() such that wchar_t <-> UTF-8 char
> conversion can be done by the library. It would be an excellent idea
> if GTK would the use these glibc routines to transform between any
> provided char * strings (e.g., in UTF-8) and the 16-bit text strings
> sent to the X server.

Unfortunately, we will not be able to rely on these functions,
since we will still be supporting legacy systems that do not
support these new additions. I don't actually think writing
the character conversion functions is all that much work
compared to other issues (such as bidirectional rendering, and
complex-text languages), but it is possible we'll fall back
to system facilities where possible.

In any case, I'm sure that the combination glibc-2.1 and future
releases of GTK+ will provide a very nice Unicode enviroment to the 
application programmer.

Regards,
                                        Owen



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]