Re: [EWMH] _NET_WM_WINDOW_TYPE_AUXILIARY



Tuomo Valkonen wrote:
On 2007-10-18, Russell Shaw <rjshaw netspace net au> wrote:
What alternative is there to UTF-8? An advantage of monoculturalism is that
if the architecture is sufficient, everything can be consistant and easy.

There are problems with locale encoding and wchar_t, but fundamentally
their abstraction is better than specifying a Single Global Encoding.
Specifying "everything is UTF-8" is an evolutionary dead-end. I think
it's better to say "here's wchar_t and functions to operate on it. We don't actually specify what the actual encoding is, because then it's a blackbox that can easily be changed." Almost likewise with LC_CTYPE multibyte encodings. Unfortunately they forget to provide convenient functions for encoding conversions when communicating with the external world (that should mostly be in the libraries, seldom in applications), and the libc multibyte routines are a bit too limited, etc. That's however something that could easily be solved if people weren't so intent on creating another problem almost as big as the ASCII and Latin1 assumptions that we're still suffering from. Indeed, you do
need and want that kind of libraries to conveniently use that Single
Global Standard too; the difference is that by specifying a particular
encoding, clean design is not encouraged, and applications can and will expect that encoding and not do things abstractly through a handful of libraries that could easily be changed (or configured).

Another major problem is the unix and C "untyped" text file and stream legacy, so you have to assume every file is in some encoding -- ASCII, LC_CTYPE, UTF-8, or so, which it may not be. That could also be solved by e.g. creating a "typed" plain text file (could be mime type stored
on fs) and stream format, assuming the locale encoding for legacy
stuff, and opening text files though some library as text streams, that then does the conversions to the abstract application internal encoding (either multibyte encoding -- not necessary LC_CTYPE, to allow wider character ranges internally in programs than in legacy files -- or wide character). That's a rather big task, but not really
that much bigger than a transition to a global monoculture.

I find it hard to see those problems because i rarely handle non-english
text.

In the general-purpose editing applications i've made (like a word processor),
any non-english text is passed out to a "black box" unicode layout processor
plugin for things like paragraph formatting, and i can make it UTF-8 or UTF-32
or whatever data encoding is convenient. I see "all UTF-8" as only applying
between completely separate applications on the pc.

I've done hardly any non-english processing, but iirc, UTF-8 files are supposed
to start with a magic number. If all text files were UTF-8, the magic number
wouldn't be needed. I'm probably missing something you mean.

I find it hard to see how all kinds of config files in /etc called be made
non 7-bit ascii without major parsing pain. To me, config file tokens should be in 7-bit latin because the content is more like program code that only programmers should see, and any non-english configuration should be done through
an i18n-ized gui imo (not having thought of anything better).


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]