Re: Concerning the hyphen confusion in BiDi (was: Re: Bidirectional Bugs in Hebrew)

Omer Zak <omerz actcom co il> writes:

> I would like to offer a fresh point of view on the problem of hyphen/minus
> sign between Hebrew letter and the number following it.
> Conceptually, there are three text orderings:
> 1. The sequence of characters, as they are typed by an user on a text
>    editor.
> 2. The sequence of characters, as stored in a file in "Logical Hebrew
>    order".
> 3. The sequence of characters, as displayed visually - in "Visual Hebrew
>    order".
> In all discussions so far, we made the implicit assumption that (1)==(2).
> Then we used the Unicode BiDi algorithm to convert from (1),(2) into (3)
> and cursed why the user has to suffer the agony of typing first an Hebrew
> letter, then a number and only then the separating hyphen if he wants to
> see letter-hyphen-number sequence.
> I believe that it is OK to store such a sequence as letter-number-hyphen
> in a Logical Hebrew ordered text file.  Such a sequence would be displayed
> (by Pango and other applications using the BiDi algorithms) as
> letter-hyphen-number (from right to left).

But this is almost certainly the _wrong_ sequence of characters to
store in the file. Most likely the right sequence is:


(Other possiblities are present as well)

Your sequence is most likely going to confuse automatic text
processing algorithms. The standard for Unicode is logical order -
characters appear in the order they are read.

Now, I don't know how feasible it is to ask users to add RLM in the
appropriate place, but if the input software is adding anything
automatically, it should be that, not reordering the text into
something that just happens to appear in the right order.

But in any case, the problem is really that if the user's keyboard
doesn't distinguish minus and hyphen, then there is an ambiguous input
sequence, and the user is going to have to provide disambiguation, by
selecting RLM from a right-click menu, selecting the text and entering
a "Force Right to Left" command, or whatever.

(And these disambiguation techniques are going to be needed
ocassionally for other sorts of mixed text anyways.)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]