Re: UTF-8 on stdout?



On Sat, 6 Jul 2002 22:37:37 +0100 (BST), "Andrew Ferrier"
<andrew junk new-destiny co uk> wrote:
In fact, the more I think about this, the more I get confused.
I can't seem to find any good introduction/references to all
this stuff on the web. Does anyone know where I can go to learn
more about character sets etc. these days?

Andrew, 

PMJI.  You might want to have a look at http://czyborra.com/  Mr. Czyborra
has a pretty good overview of what's what regarding encoding and character
sets, and does a good job of distinguishing between fonts, glyphs, and
characters.  You may in particular want to look at:

        http://czyborra.com/unicode/terminals.html

What you bumped into was, as Lars said, a problem with xterm.  If you push
UTF-8 to stdout, it falls to the application whose job it is to convert
encoded values into glyphs that your brain can interpret as characters
(I'm skipping a few steps).  The standard xterm is *not* going to expect
UTF-8; it will instead interpret the bytestream as ASCII or Latin-1 or
whatever your locale settings indicate.  

For your entertainment, I offer an Opinion.  :)

It's an interesting problem.  If you enter:

dia --credits |sort

how is sort(1) supposed to know what's incoming?  It doesn't guess; it
assumes, and unless the answer is 7-bit ascii, it assumes wrong.  Its only
defense is, it's got a lot of good company.  

ASCII's hegemony is of a kind that Microsoft can't even imagine.  It has
predominated for so long that we tend to think of 65 as having some
natural affinity for 'A'.  Like, "In your heart, you know it's ASCII." 
But it's more like, "One ring to rule them all, and in the darkness bind
them".  Maybe we're at last carrying ASCII to Mordor, and none too soon. 
I don't think it will bring world peace, but I do think it will invite the
world's complicated languages into the realm of free software.  

Regards, 

--jkl



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]