The Unicode support tables



Hi,

I'd like to elaborate on something I said in a previous post, about the
Unicode text direction table.

The text direction support table is one of the many Anglo-centric
aspects of Unicode. That isn't obvious at first sight, since the table
apparently codes every character as a peer of every other. However, the
table only codes the direction most appropriate for text which is
basically left-to-right. Inserts of, say, Arabic in English are handled
correctly. Inserts of English in Arabic are handled correctly. However,
what would the average person do when mixing Arabic and Chinese? I'm
pretty sure the whole thing would be written right-to-left (I actually
interviewed an Arab programmer once, who went to university in Beijing
and could read Chinese - if only I could find him now I could ask him!).
The Unicode tables ignore this kind of thing, and would lead to the
insertion of right-to-left Arabic in left-to-right Chinese.

If the Unicode tables were truly language independent they would include
all meaningful directions for each character, with one of those marked
as the default. In this way you can start off with the notion that "this
text will be basically left-to-right", or "this text will be basically
top-to-bottom", and render sensibly. You could even start off with no
notion of direction, and if there is a direction which is suitable for
every character in the text, use it automatically (it might be
inefficient, but it sounds like fun). Any character which does not
naturally fit the basic direction can be handled using the direction
marked as default, and the practices commonly used in mixed language
typesetting.

So, my conclusion is that the current Unicode direction table is
defficient; that it should include all meaningful directions; and that
one of those needs to be marked as the default, for fallback processing.

All comments welcome.


BTW. Does this list silently drop mail from non-members? I posted this
earlier from another email account, and it seemed to disappear into the
ether(net).

Steve




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]