(forw) Unicode/UTF-8 in GTK



----- Forwarded message from Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk> -----
Received: from cs2.CS.Berkeley.EDU (cs2.CS.Berkeley.EDU [169.229.60.56]) by paris.CS.Berkeley.EDU (8.8.3/8.8.2) with ESMTP id JAA12784 for <jmacd@paris.CS.Berkeley.EDU>; Sat, 6 Feb 1999 09:06:08 -0800 (PST)
Received: from scam.xcf.berkeley.edu (scam.XCF.Berkeley.EDU [128.32.43.201]) by cs2.CS.Berkeley.EDU (8.9.1a/8.6.6.Beta11) with SMTP id JAA17237 for <jmacd@cs.berkeley.edu>; Sat, 6 Feb 1999 09:06:05 -0800 (PST)
Received: (qmail 5836 invoked by uid 27263); 6 Feb 1999 17:07:17 -0000
Delivered-To: jmacd@xcf.berkeley.edu
Received: (qmail 5826 invoked from network); 6 Feb 1999 17:07:15 -0000
Received: from heaton.cl.cam.ac.uk (exim@128.232.32.11)
  by scam.xcf.berkeley.edu with SMTP; 6 Feb 1999 17:07:15 -0000
Received: from trillium.cl.cam.ac.uk (cl.cam.ac.uk) [128.232.8.5] (mgk25)
	by heaton.cl.cam.ac.uk with esmtp (Exim 1.82 #1)
	id 109BB9-0003jZ-00; Sat, 6 Feb 1999 17:05:59 +0000
X-Mailer: exmh version 2.0.2+CL 2/24/98
To: petm@xcf.berkeley.edu, spencer@xcf.berkeley.edu, jmacd@xcf.berkeley.edu
Subject: Unicode/UTF-8 in GTK
X-URL: http://www.cl.cam.ac.uk/~mgk25/
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sat, 06 Feb 1999 17:05:57 +0000
From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk>
Message-Id: <E109BB9-0003jZ-00@heaton.cl.cam.ac.uk>

Have you any plans to support Unicode strings in GTK?

There are now several decent X11 "*-ISO10646-1" (Unicode) fonts
available, for instance on

  http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html

I have extended the 6x13 xterm default fixed font to a repertoire
of 2800 Unicode characters, including all Latin, Greek, Cyrillic,
Phonetic Alphabet, and mathematical characters.

I am currently working on extending other X11 fonts to a decent
Unicode repertoire as well. If you are interested in making GTK
Unicode capable, I'd be happy to work on extending those fonts with
highest priority that you consider to be most important for GTK
users.

The X11 "*-iso10646-1" Unicode fonts are 16-bit fonts. The characters
0x0000 - 0x00ff follow the ISO 8859-1 standard. Unicode strings are best
represented under Unix either as a wchar_t array using a 16-bit value
per character, or as a char array in the UTF-8 encoding, using a 1-byte,
2-byte, or 3-byte character sequence for every character. The 1 byte
sequences in the UTF-8 encoding are exactly the 7-bit ASCII characters,
so that UTF-8 files are strictly ASCII backwards compatible.

See "man utf-8" and "man unicode" for details.

glibc 2.1 will implement all the new ISO C Amendment 1 functions
such as wprintf() and mb2wc() such that wchar_t <-> UTF-8 char
conversion can be done by the library. It would be an excellent idea
if GTK would the use these glibc routines to transform between any
provided char * strings (e.g., in UTF-8) and the 16-bit text strings
sent to the X server.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
----- End forwarded message -----

-- 
-josh



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]