Updating gtkimcontextsimple.c (bug #321896)



Hi All,

This is a follow up on the last patch submitted for bug #321896,
http://bugzilla.gnome.org/show_bug.cgi?id=321896

My understanding is that it is too late to consider applying the patch
to this version of GTK+, so this will take place latter. What I am not
sure is if it is ok to apply Tor's patch only for now.

An overall description of the work is at 
http://blogs.gnome.org/simos/2008/01/30/improving-input-method-support-in-gtk-based-apps/

gtkimcontextsimple.c comes with a table of compose sequences with the
purpose of replicating the content from
http://gitweb.freedesktop.org/?p=xorg/lib/libX11.git;a=blob;h=02561d4dc4a6df4f3abcfe17a35e9bb8fd3e627e;hb=16a76091cd632e5a3708e235ff864b58f3e4613e;f=nls/en_US.UTF-8/Compose.pre

About 20% of the current upstream compose sequences can be produced
algorithmically with Tor's patch, so we exclude those from the GTK+
table.

For the rest, they can be autogenerated with the provided script (in the
bug report). Now the compose sequence table is placed in a separate
file, gtkimcontextsimpleseqs.h.

The Xorg Compose table contains characters from Plane 1. These are
currently only a handful, and there is no keyboard layout to produce
them. For now we do not touch these until at least the Xorg Compose file
gets a proper maintainer.

The Xorg Compose file has some of the standard keysyms with value +=
0x100000, which have special meaning for Xorg. Here we take away
0x100000 as it has no meaning currently in GTK+.

At the moment, the table is guint16, and I think it is good to keep it
like this for the time being.

In bug #341341, Danilo talks about support for compose sequences that
produce more than one Unicode characters, as in
COMBINING ACUTE + CYRILLIC LATIN A where no precomposed form exists.
At the moment, the Xorg Compose file does not have such compose
sequences. If we were to implement in GTK+, I would suggest to build up
a new table of the form

dead_acute, A, E, H, I, O, U, ...  (assume all these cyrillic)
dead_diaeresis, A, E, H, I, O, U, ...  (assume all these cyrillic)

In check_algorithmic, we currently check if the compose sequence can be
normalised to a single Unicode character. So, here we can also check if
the compose sequence matches the "valid" compose sequence (a cyrillic
small 'a' with a combining acute is ok), and
gtk_im_context_simple_commit_char() those characters.
How would we know which compose sequences are "valid"? We can parse
parts of ftp.unicode.org/Public/UNIDATA/NormalizationTest.txt
Putting this functionality in check_algorithmic() will save space in the
table, as we already know the resulting Unicode characters.
In any case, such work should be considered after the existing patch
goes in.

Simos



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]