Re: UTF-8 with GTK

From: Owen Taylor <otaylor redhat com>
To: Raymond Wan <rwan cs mu oz au>
Cc: gtk-i18n-list gnome org
Subject: Re: UTF-8 with GTK
Date: 21 Jun 2001 09:36:52 -0400

Raymond Wan <rwan cs mu oz au> writes:

> Hi all,
> 
> 	I was playing around with testtext, which comes with GTK+ 1.3.X
> and noticed that it can support non-Latin languages (i.e., Japanese) as
> long as it is encoded in UTF-8.  I also looked at the manpages of UTF-8
> and it said that it is "the way to go for using the Unicode character set
> under Unix-style operating systems."
> 
> 	[Apologies in advance, but my knowledge of Unicode is somewhat
> limited...].  I was wondering why is this so?  Sure, I read the rest of
> the man pages and it mentioned some of the benefits...however, from the
> Japanese point of view (or Chinese, Korean, whatever), where every text
> file is most likely in an Asian language, isn't it a waste of space that
> some characters will take up 2, 3, or even more bytes (where 1 byte = 8
> bits).  If I used an encoding such as Shift-JIS for Japanese or ummmm,
> BIG5, I think, for Chinese, won't most characters be two bytes in size?
> 
> 	I guess what I'm asking is that isn't UTF-8 more for accomodating
> Latin-based OS' to read Asian (and Middle Eastern) languages and not for
> Asian OS' to read Asian languages?  Presuming I'm right so far, is there a
> way to make GTK support alternative encodings like UTF-16 or S-JIS, BIG5,
> etc.?
> 
> 	[Sorry, I realize there is a lot that I've said which indicates
> I'm a newbie...which I am.  I guess what I was really getting to is this
> last question about GTK support for other encodings.]

The simple answer is no, we aren't supporting other character sets or
encodings, except for external file IO, and by providing conversion
functions. Supporting multiple character encodings makes everything much
harder.

The expansion of using UTF-8 as compared to UTF-16 for the more
commonly used Han characters is 3 bytes vs. 2 bytes, but since
most users of Asian text will also be using a fair bit of ascii, which
is 1 byte in UTF-8, it's really somewhat less than 50%.

This <50% is the price of providing a much more uniform system,
which makes it more likely that software will support Asian languages
out of the box, without special patches.

Regards,
                                        Owen

References:
- UTF-8 with GTK
  - From: Raymond Wan

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]