Re: UTF-8 with GTK



At 11:04 AM 06/21/2001 -0400, Owen Taylor wrote:
But basically, except for text storage applications, the space used
for text isn't a huge deal these days. And if text storage size is a
concern - then you probably want to use some sort of compression -
you can do a lot better than either UTF-8 or UTF-16, even with
simple algorithms. (http://www.unicode.org/unicode/reports/tr6/)

SCSU is very language sensitive. SCSU is designed to approach the
space that a character set designed for that language would give
you - so for Chinese you probably aren't going to beat UTF-16 by
much, unless you have a lot of non-ideographic characters. It's
also not going to beat UTF-8 for English. SCSU's only a big win
if you're encoding Greek or Arabic or Japanese, something that
uses a bunch of characters from the same block repeatedly. If
you're going to be storing a lot of data, gzip is probably the
way to go, though I've heard SCSU + gzip beats either of them
alone.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]