Re: One key stroke --> two code-points

From: Simos Xenitellis <simos lists googlemail com>
To: Javier SOLA <javier khmeros info>
Cc: gnome-i18n gnome org, gtk-i18n-list gnome org, Jens Herden <jens khmeros info>, Bart Geesink <bart geesink org>
Subject: Re: One key stroke --> two code-points
Date: Mon, 09 Jun 2008 19:58:30 +0100

O/H Javier SOLA έγραψε:

Thanks Simos !!

Actually, we have had these additions for a while in X11.

Hi Javier,

Checking at
http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=tree;f=nls/en_US.UTF-8

does not show these lines at the end. It is possible that these composesequences were added as a patch to the distribution package.


We will  do an issue for GTK+, and use the variable meanwhile.

What file is it in GTK+? I have not been able to find it.

In GTK+ (HEAD), the relevant file is
http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimple.c?view=markup

However, your case of compose sequences is different from the existingcompose sequences, that result to a single codepoint (you require toproduce two codepoints).

Therefore, the type of support you are looking for is similar to composesequences that result to letter+diacritic mark. Several languages havecharacters that no pre-composed letters exist, so the compose sequenceproduces letter+diacritic marks (more than one codepoint). Such supportis missing, and there are already bug reports for them.

Bug 341341 – Compose mechanism in simple input method doesn't supportdecomposed forms

http://bugzilla.gnome.org/show_bug.cgi?id=341341

Bug 345254 – dead accents should at least produce combining characters
http://bugzilla.gnome.org/show_bug.cgi?id=345254

There is a shortcut when trying to solve the above cases of composesequences, thus the solution I expect to be different from the Khmercompose sequences.Specifically, for the Latin compose sequences, such as (it's a made upexample)


<dead_acute> <t> : "t́" # LETTER T WITH ACUTE

one could convert to something like    [ dead_acute, 't', 0].

We would put 0 for the resulting codepoint because we can deduce forthis category of compose sequences that the actual codepoints are 't'and 'acute' (the resulting codepoints match the body of the composesequence).

However, for the case of Khmer, the compose sequences look independentfrom the resulting code points. Therefore, a new table should be required.


To cut the story short, I have filed a bug report for this,
Bug 537457 – Support compose sequences that produce two+ codepoints
http://bugzilla.gnome.org/show_bug.cgi?id=537457

Simos

Thanks,

Javier

Simos Xenitellis wrote
O/H Javier SOLA έγραψε:
Hi,

I am working on Khmer localization (KhmerOS project).
In Khmer, some of the basic vowels (which we include in thekeyboard) require two code-points, so one keystroke must generatetwo code points.
It used to be that we could do the conversion in KBX by generating afictious code-point (Pablo Saratxaga explained this to us a fewyears ago), which was later translated to two real code-points byputing the conversion in the en-US locale file. I did work at the time.
But now this seems to have stopped working. Does anybody knows howwe can fix this?
These additions (pressing a single key and producing two codepoints),are located at
/usr/share/X11/locale/en_US.UTF-8/Compose
The specific lines appear to be

# Khmer digraphs
# A keystroke has to generate several characters, so they are defined
# in this file

<U17fb>    :   "ុះ"
<U17fc>    :   "ុំ"
<U17fd>    :   "េះ"
<U17fe>    :   "ោះ"
<U17ff>    :   "ាំ"
GTK+ based applications duplicate the Compose file in the gtk+library, and currently the version of the Compose file that exists ingtk+ does not include those specific compose sequences.
I think these are a recent addition.
Technically, it is possible for gtk+ to include compose sequencesthat produce more than one code points (requires small change in thecode), however these recent Khmer digraphs are the only composesequences using the facility now.
To cut the long story short, you can bypass for now the GTK+ versionof the Compose file and use the Compose file that comes with X.Org(shown above) by setting the environment variable GTK_IM_MODULE to"xim".
This should not have adverse effect to the OLPC software.
It is important that if other keyboard layouts as well requirecompose sequences that producetwo or more codepoints (such as Serbian), to add them to the XOrgCompose file. In the next iteration of update of the GTK+, all thesecompose sequences can make it in.
Simos

Follow-Ups:
- Re: One key stroke --> two code-points
  - From: Jens Herden
- Re: One key stroke --> two code-points
  - From: Anousak Souphavanh

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]