Re: Why does gtk_text_buffer() normalize the cluster on backspace?
- From: Behdad Esfahbod <behdad behdad org>
- To: Dov Grobgeld <dov grobgeld gmail com>
- Cc: gtk-i18n-list gnome org
- Subject: Re: Why does gtk_text_buffer() normalize the cluster on backspace?
- Date: Thu, 03 May 2007 20:04:47 -0400
On Sun, 2007-04-29 at 00:23 +0300, Dov Grobgeld wrote:
> Continuing on my latest posting, I investigated the definition of the
> "unicode canonical ordering" that is used by g_utf8_normalize() and I
> found the following quote from
> http://unicode.org/faq/normalization.html#8:
>
> <quote>
> Q: Isn't the canonical ordering for Arabic characters wrong?
>
> A: The Unicode Standard does not guarantee that the canonical ordering
> of a combining character sequence for any particular script is the
> 'correct' order from a linguistic point of view; the guarantee is that
> any two canonically equivalent strings will have the same canonical
> order.
>
> In retrospect, it would have been possible to have assigned combining
> classes for certain Arabic and Hebrew non-spacing marks (plus
> characters for a few other scripts) that would have done a better job
> of making a canonically ordered sequence reflect linguistic order or
> traditional spelling orders for such sequences. However, retinkerings
> at this point would conflict with stability guarantees made by the
> Unicode Standard when normalization was specified, and cannot be done
> now. [KW]
>
> </end quote>
>
> Basically it sais that the ordering of the accent characters for
> Arabic and Hebrew, may not be relied upon for any linguistic
> interpretation. I consider what character to erase when pressing
> backspace to be such an interpretation.
>
> As I see it there are two ways of fixing this:
>
> 1. Add an calc_char_to_erase() routine in the pango language modules
> that receives a cluster and determines what character should be
> erased. If the language module does not define such a routine than
> either canonical ordering or no reordering is done.
>
> 2. Drop the canonical ordering all together.
Thanks Dov for raising this. This is discussed in bugs 155948 and
350132. My plan to fix the backspacing issue is very similar to what
you suggest. In short:
- Add new Pango API pango_backspace() or something. It will take text
of a single grapheme in UTF-8 and return new text for it. Apps need to
call this only if backspace_deletes_char attribute is set to TRUE, but
doesn't harm if they call it unconditionally.
- pango_backspace() then will normalize text to NFD, convert it to a
GArray of gunichar items, calls backend API to delete one char, or if no
backend API, delete the last char, convert back to UTF-8, and finally if
the initial normalization expanded the text, normalize it to NFC and
return.
- A new method in Pango lang engine API will be added, something like
backspace() even, that takes the GArray and performs the right thing for
the script and language involved.
I may go ahead and code this tonight.
A further enhancement is for Gtk+ to be smart in "input mode". That
is, when backspace is hit, it will check the undo buffer, if the last
performed action was adding one or more character to the cluster at the
current cursor position, it will remove that instead of calling into
Pango. Supporting this for more than one backspace is a bit more
involved by doable. This makes for a much better user experience where
backspace undoes what you typed last.
> Regards,
> Dov
Cheers,
--
behdad
http://behdad.org/
"Those who would give up Essential Liberty to purchase a little
Temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin, 1759
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]