Re: Editing and formatting characters



Kaixo!

On Tue, Nov 14, 2000 at 05:35:45PM -0500, Owen Taylor wrote:
 
> We recently ran into some bugs with GTK+ and Pango with the handling
> of formatting characters. And it occurred to me that I never had
...

>  ZWNBS - zero width non-break space
(jsut out of curiosit, what is the purpose of that one??)

> In the following, I suggest one way of handling formatting characters
> during cursor positioning and deletion. Implementing this in

> As an example, lets use the example that has been coming up on this
> list over the last few weeks:
> 
>    HEBREW TEXT [HYPHEN] [RLM] 1234
>                        ^     ^
>                        A     B
> 
> There should not be cursor positions at A and B because that would

I don't agree.

If we want to be able to manipulate those chars then they *must* be visible,
and both A and B cursor positions *must* be possible.

However, I agree that it is not the normal behaviour.

So we need that both display and editing (cursor and thing) have two modes:
1. normal
2. "show control and formatting characters"

And all text editors should be able to provide a way to tell pango to
switch between those two modes.

Pango would also need to have a set of glyphs to display those special chars
(and the choice of glyphs should be thought a bit, as it must be done in
a way to make them easily recognizable by users to understand what they do.
Maybe widespread fonts already provide such glyphs or unicode recomends
the use of a given glyph in some cases?)

> result in having to hit the arrow key twice to at that position, which
> would be very non-inituitive. (Not to say that invisible formatting
> marks is intuitive to begin with...)

So, in normal mode, only one arrow key would be necessary, and the formatting
mark would only be deleted when the "associated char" would be (there must
be a way to decide which one it is; the concept of "associated char" would
be somewhat similar of the one used in Thai editing I think).

And in "show special chars" mode, the RLM mark will be shown just as another
char, so copianble, deletable, etc.

> I think the correct cursor position is at B - that is, the mark should
> be associated with the logically-preceding character.

not always.
consider this example: <space> <ZWJ> <arabic letter>
here the ZWJ should be associated with the arabic letter, not with the
space; as it is evident that the purpose of the use of ZWJ in such case
is to modify the behaviour of the next char, not the previous one.

In your case I agree that RLM should be associated with the hyphen, but
not for the reason you give.

I think the algorithm should be grossly to associate the formatting char to
the nearest char it modifies its behaviour, if both previous and next char
are modified same way, associate it with the next one, if it is a change
of direction, associate it with the char being neutral, if none, to the one
being of the opposite directionality, if none, to the next char.

In your example the global direction was RTL, then the hyphen changed it to
neutral (I think, I woudl need confirmation here), then the RLM changes
it again to RTL; then there are the digits, which are LTR.
So, both the hyphen and the first digit are modified --> attach to
the previous char.
But now consider the same *without* the hyphen:

HEBREW TEXT <RLM> 1 2 3 4

Here, RLM doesn't change the behaviour of the last Hebrew letter, so it
must be associated with "1" IMHO.


I think the formatting chars must be associated with the next char because
they influence the behaviour of what is after them; however, some of them
have influence on both sides (ZWJ, ZWNJ, ZWNBS,...), they should be associated
with the side they really modify, if they modify both sides, then associate
it with th next char (but I'm not 100% sure about that).
some other chars modify the global direction (LRM, RLM), they must be
associated to the neutral char as it is the one that is modified;
if both sides are neutral, associate to the next one. If none is neutral,
associate it with the char that has opposite directionality, if both
are opposite, associate with the next one; if both sides have same
directionality, associate with the next char.

Does that makes sense ?

> The question is what should happen when you hit Delete when the cursor
> position is at B.

You mean, the BackSpace key I presume ?
The RLM beign associathe with the hyphen, not with the numbers, deleting
the numbers should have no effect on the RLM mark.

> The two possibilities are that only the RLM is
> deleted, or both the RLM and the HYPHEN are deleted. I believe that
> the better behavior is to delete both the HYPHEN and the RLM.

That depends.
First, that is valable only in "normal" mode; in "display control codes"
it is shown as a char of its own and the user can do as usual.
So, in "normal" mode, the behaviour should be, imho:
For modifiers that only act on adjacent char, yes, delete both the
associated char and the formatter.
For LRM and RTL mark, however, they have also effects on other chars
that just the adjacent one; so the formatter char should be kept as
long as its effect remains; eg:

HEBREW TEXT [HYPHEN1] [HYPHEN2] [HYPHEN3] [RLM] x 1234
('x' being the cursor position).

hitting backspace should delete the hyphen3 but *not* the RLM; which will
then be associated to the hyphen2 and so on; only at deleting hyphen1
will the RLM be deleted too.
That is, if the associated char is deleted, the two new neighbors should
be evaluated to see if they stand at a same or higher priority
(same directionality>opposite directionality>neutral), if yes, keep them
(and attach to the correct new neighbor) if not, delete it too. 

> is, the marks should be as invisible to editing as possible. 
> If greater control is needed, then a special "view invisible characters"
> mode should be available.

I fully agree.

> The difference with the range-affecting characters is that the
> paired nature must be maintained. For this example, we'll an alternate
> encoding of the above as:
> 
>    [RLE] HEBREW TEXT [HYPHEN] [PDF] 1234
>                                    ^
>                                    B
> 
> Again the cursor cursor position should be at B, but when we delete at
> B, we don't want to simply delete the PDF, we instead want to delete
> the HYPHEN and leave the [PDF]. Then when we get to:

Mmmh, I have a doubt here... does the RLM and LTR mark change the global
direction of all following chars or not ?
In other words, are RLM and LTR "range formatters without explicit ending" ?

>    [RLE] H [PDF] 1234
>                 ^
>                 B
> 
> Hitting delete would then delete PDF, H and RLE.

I fully agree with you.

But I think RLM and LRM need also a somewhat similar treatment (unless
I'm completly wrong about the meaning of RLM and LRM)
some formatters of the first group need also a similar

> Does this sound like the right behavior?
> 
>                                         Owen

I think handling of one char formaters need more complex handling; as they
are not always acting on only one char but on two (ZWJ and such)
or on one, but, by induction, to all others of same type behind it (case
of RLM and LRM)
 
-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]