Re: Updating gtkimcontextsimple.c (bug #321896)

From: "Simos Xenitellis" <simos lists googlemail com>
To: "Samuel Thibault" <samuel thibault ens-lyon org>
Cc: gtk-i18n-list gnome org
Subject: Re: Updating gtkimcontextsimple.c (bug #321896)
Date: Wed, 20 Feb 2008 10:04:56 +0000

Hi Samuel,

I had some discussions on this and I think the problem can be resolved
in the following way.

To add combining diacritics there is no need for extra support in
GTK+; this is something that is handled by the keyboard layouts (which
are not handled by GTK+).
What that means is that you need a keyboard layout that produces all
those combining diacritics.

The project for the keyboard layouts is xkeyboard-config,
http://freedesktop.org/wiki/Software/XKeyboardConfig

For your case of Tagbanwa, you would create a new keyboard layout.
For the generic case to add combining diacritics to different
characters, a catch-all keyboard layout could be used.

Currently, there is no GUI tool to create such keyboard layouts.
In your Linux system, keyboard layouts live in /etc/X11/xkb/symbols/
You can have an idea how to modify an existing layout by looking into the files.

If you would like to pursue this further, I would be happy to give you
instructions.

Simos

On Feb 7, 2008 11:19 AM, Samuel Thibault <samuel thibault ens-lyon org> wrote:
> Hello,
>
> Simos wrote:
> > In bug #341341, Danilo talks about support for compose sequences that
> > produce more than one Unicode characters, as in
> > COMBINING ACUTE + CYRILLIC LATIN A where no precomposed form exists.
> > At the moment, the Xorg Compose file does not have such compose
> > sequences. If we were to implement in GTK+, I would suggest to build up
> > a new table of the form
> >
> > dead_acute, A, E, H, I, O, U, ...  (assume all these cyrillic)
> > dead_diaeresis, A, E, H, I, O, U, ...  (assume all these cyrillic)
>
> The problem is that this is very tedious for people who already have a
> hard time making Linux suit to their language (fonts, messages, locales,
> ...) and can potentially be very big. For instance in vietnamese you may
> need to put two accents on a voyel, and so you'd need to enumerate all
> such possible combinations.
>
> > In check_algorithmic, we currently check if the compose sequence can be
> > normalised to a single Unicode character.
>
> Which is necessary for proper string unicity/comparison etc, yes.
>
> > So, here we can also check if the compose sequence matches the "valid"
> > compose sequence (a cyrillic small 'a' with a combining acute is ok)
>
> There is no such thing as a "valid" compose sequence. As Unicode says,
>
> "All combining characters can be applied to any base character and can,
> in principle, be used with any script. As with other characters, the
> allocation of a combining character to one block or another identifies
> only its primary usage; it is not intended to define or limit the range
> of characters to which it may be applied.  In the Unicode Standard, all
> sequences of character codes are permitted.
>
> This does not create an obligation on implementations to support all
> possible combinations equally well. Thus, while application of an
> Arabic annotation mark to a Han character or a Devanagari consonant is
> permitted, it is unlikely to be supported well in rendering or to make
> much sense."
>
> So there are indeed combinations that don't make so much sense, but
> enumerating those that make looks to me unnecessary work:
>
> - It may be potentially very big, just see all the possible vietnamese
>   combinations.
> - It will mostly never be complete, there will always be a language
>   (say, for instance, tagbanwa) which nobody takes care of.
> - Why limiting ourselves like this? It has been objected that a generic
>   support potentially leads to "odd" things like n̈̈̈, which is an n
>   with three diaeresis on it.  I don't think this is odd: if the user
>   pressed the dead_diaeresis key several times, I guess he indeed wanted
>   to have three diaeresis, and if they don't show up, then the text
>   rendering engine is probably broken and may not for instance properly
>   show ẫ, which is needed for vietnamese (actually, on my system,
>   pango shows both fine).  Actually I think some mathematicians may even
>   have a use for n with several diaeresis :)
>
> > How would we know which compose sequences are "valid"? We can parse
> > parts of ftp.unicode.org/Public/UNIDATA/NormalizationTest.txt
>
> It is _not_ a table of "valid" characters, it is only a partial test
> to check that the algorithm which transforms character + combining
> character into normalized precomposed form works correctly. Actually,
> a table that would hold _all_ the valid combinations would be very
> big. Just for the vietnamese language, there would be 10*6 entries.
>
> Instead, it could be solved once for all by systematically turning
> <dead_foo> <bar>, <combining_foo> <bar> and <Multi_key> <foo> <bar> into
> "Ubar Ucombining_foo". The only limitation is the font rendering engine,
> which seems to already do a pretty good job in all the cases: if I try
> to put a tagbanwa accent on a latin accent, it just works. If I try to
> put a combining kannara vocalic on a kannara character to which it isn't
> supposed to apply, it just shows the character and then the combining
> vocalic with a dotted circle.
>
> If the implementation can be generic enough that it works ASAN for every
> languages in the world without more work, then why not do it?
>
> Samuel
> _______________________________________________
> gtk-i18n-list mailing list
> gtk-i18n-list gnome org
> http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
>

Follow-Ups:
- Re: Updating gtkimcontextsimple.c (bug #321896)
  - From: Simos Xenitellis
- Re: Updating gtkimcontextsimple.c (bug #321896)
  - From: Samuel Thibault

References:
- Re: Updating gtkimcontextsimple.c (bug #321896)
  - From: Samuel Thibault

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]