Re: *** SPAM *** Re: [EWMH] _NET_WM_WINDOW_TYPE_AUXILIARY



Le Jeu 18 octobre 2007 22:17, Tuomo Valkonen a écrit :
> On 2007-10-18 21:43 +0200, Nicolas Mailhot wrote:
>> FOSS needs to exchange data with other systems (internet remember?).
>> That means sharing encoding conventions.
>
> The internet (HTML) actually does let one specify encoding used,

Data exchange on the internet is more than HTML and besides at least
20% of the HTML pages I see everyday have broken encoding
specification (The W3C spec states plainly encoding must not be
assumed to be ISO-8859-1, change your browser fallback encoding to
anything else and watch the breakage)

>> It's a pity the W3C didn't go the full
>> way and allowed to specify something else – non-UTF-8 XML files win
>> you
>> nothing and are a constant source of bugs.
>
> Uhhh? It wants you to specify encoding in the header tag,

The XML spec does not "want" you to specify encoding, specifying it is
optional, and my experience is any system or person that sets it to
something other than UTF-8 is making an encoding mistake somewhere in
the file.

>> That's a result of trying to cater to every known human script.
>> Which
>> needs to be done to digitalise existing stuff. No one so far has
>> proved
>> it could be done better.
>
> Unicode contains a lot of stuff that really doesn't belong in a
> low-level character mapping -- Unicode it doesn't even really know
> what it is. Much of the maths stuff, for example, is pointless to
> have there,  and should be handled with different fonts in some
> contexts,

Spare me, I've already seen enough dead documents that relied on
special fonts to be read. Math symbols need to be clearly specified
like everything else.

> and at a much higher level in other contexts. Then
> there's the accent duplication fuckup, etc., as a consequence
> you have various normalisation form complications. (Yes, although
> the sound is the same, Finnish ä actually is not in some semantic
> sense the same as German ä. In former it is a proper letter,
> whereas in the latter it is umlauted a. Unicode perhaps
> unintentionally provides both

It is very intentional since people and apps create accented letters
both ways, and round-trip encoding conversions require remembering how
the letters were created.

> -- a separate codepoint, and a
> a combined character -- which complicates many matters a lot,
> and the distinction seems to me to be relevant only at a much
> higher semantic level than Unicode probably should be.

It's not a semantic but a technical distinction.

> Composition
> is the more general approach, so I think it should've been chosen.)

That would be fine if we started with a clean sheat, but then we'd all
be writing esperanto or something like that.

> I think the Chinese have also expressed that the handling of their
> writing system in Unicode is totally fucked up,

The Chinese and Japanese complain they've been conflated, and they
just have to sort it between themselves and suggest unicode changes,
instead of waiting for westerners to do it for them.

> and should be more based on composition,

That's your beef, not what Chinese complain about

> which would safe code points. And so
> on. A lot could be improved,

And in case you haven't noticed it we're at Unicode 5 now and the
standard is being continualy revised and fixed. And maybe if the
points you don't like haven't been changed there's more to them than
you think.

> but settling on a monoculture will
> make it very difficult -- practically impossible.

You can complain of monoculture but there is zero alternative to the
Unicode.org consortium today and people who've looked at the problem
seriously are more than happy to let it handle the mess human scripts
are.

-- 
Nicolas Mailhot


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]