Re: Unicode character entry

From: Adam Rigg <adamrigg uwm edu>
To: gnome-list <gnome-list gnome org>
Subject: Re: Unicode character entry
Date: Tue, 29 Jun 2010 05:42:27 -0500 (CDT)
----- Joe Smith <jes martnet com> wrote:
> Joe Smith <jes <at> martnet.com> writes:
> 
> > 
> > For the past few years using Gnome on Fedora, I have been able to enter 
> > arbitrary Unicode characters in any Gnome/Gtk application using 
> > Ctrl+Shift+U followed by the character's code point as hex digits.
> > 
> > I just upgraded to Fedora 13 which includes Gnome 2.30, and this handy 
> > feature seems to have disappeared!
> > ...
> > Is there any way to get the old behavior?
> 
> I was given a method that restores the old behavior for Fedora 13; I
> expect it will work in F12 also. See
> https://bugzilla.redhat.com/show_bug.cgi?id=598289
> 
> Remove the xim package:
> 
> $ sudo yum remove gtk2-immodule-xim
> 
> The script, /etc/X11/xinit/xinput.d/none.conf, checks whether the xim
> package is installed and sets GTK_IM_MODULE=gtk-im-context-simple only
> if xim is not installed.
> 
> Xim was installed with F13, in an en_US locale, and even though no input
> method was configured and xim was not active, its presence on the system
> prevented the gtk-im-context-simple module from being loaded.
> 
> Removing the xim package will change the default for all users on the
> system. A particular user should still be able to configure the ibus
> input methods, but xim will not be available.
> 
> I can still get characters using the compose key as well.
> 
> Creating/modifying ~/.{gnomerc,xinputrc,gtkrc,xinit} did not work for
> me, but I can't say for sure that I correctly spake the necessary
> incantations ;-)
> 
> I still don't know if there is any general policy regarding the default
> input method for Gnome/gtk+. In my experience, many users can benefit
> from a single, standard, documented method for entering characters by
> code point; I hope this will be clarified soon.
> 
> _______________________________________________
> gnome-list mailing list
> gnome-list gnome org
> http://mail.gnome.org/mailman/listinfo/gnome-list

It sounds like there is a policy or process in Fedora interfering with the GTK input methods/method modules. GTK+ input methods are defined by GtkIMContext, and the default method of inputting arbitrary code points by holding down the Ctrl and Shift keys, followed by a "u" and some hex code doesn't appear to have changed since it was first documented in the GTK+ Reference Manual (2.16 release), nor has it changed in the development branch:

http://library.gnome.org/devel/gtk/2.16/GtkIMContext.html
http://library.gnome.org/devel/gtk/stable/GtkIMContext.html#GtkIMContext.description
http://library.gnome.org/devel/gtk/unstable/GtkIMContext.html#GtkIMContext.description

Obviously, that method will not work with qt or FLTK applications (among others).

I definitely agree with your comments about having a standardized method of input for arbitrary UTF code points, and your examples on the redhat bugzilla illustrate this need quite well (punctuation, technical symbols, etc.). On the other hand, it is also the application developers' decision to deviate from the defaults by remapping input to output (e.g., games, vim keybindings, keyboard shortcuts).

I would add that any standardized method of compose key handling (like Ctrl+Shift+u+HEX) should be respected throughout the entire operating system--It isn't difficult for an end user to adapt to a new method like switching from Windows style alt codes to dead keys and U.S. International keyboard layouts or OS X escape sequences, but to constantly switch methods between the command line, FLTK, GTK, qt and whatever else on the same platform with similar goals in mind can be a bit of a nuisance. This is especially problematic with natural language input and will continue to be so until we all have Star-Trek style touch screen keyboards. 

In your redhat posting, you also raise the question of what policy will be in the future, with respect to respect to Latin alphabet localizations and input handling. Maybe the policy question could also be considered in relation to the broader scope of character encoding in general. My understanding is that GTK+ input methods focus on the relationship between C-code and UTF-8, which is a reasonable approach, since an underlying process can readily pass UTF data to another process to be handled according to the functional needs of any given user and regional keybindings can be handled by a localization team. Input handling at the toolkit level which seeks to work around missing input methods at other levels of programming has led to redundancy in code and would be better handled by tools which do not depend on X11 but which can be controlled or developed within it--but that's the future.

As a background/primer to anyone else reading this, writing systems which are derived from the same alphabet still have different input requirements on the application level, due to geographical differences between physical input sources (i.e., the hardware). As such, methods are necessary to input the underlying data to be expressed. For example, the ñ is not a part of the English language and is not included on keyboards distributed in the United States. Latin American keyboards tend to exclude «Spaniard quotation marks». Yet, at least 10% of the U.S. population needs some way of entering the ñ, whether it be phonetically or with the actual diacritical mark--The latter is preferred within academic circles. Standards like UTF are a step in the right direction, but subtle regional differences in the hardware, in proprietary encoding schemes or on the administrative level unnecessarily complicate the issue of getting an "A" on your Spanish homework. In other words, input handling requirements are significantly different even when you're using the same alphabet, regardless of what's at stake.

see also: http://www.joelonsoftware.com/articles/Unicode.html

The key to future standard input handling policy of escape sequences and compose keys for documents written on a system with a Latin-based localization probably has more to do with progress on low level code than the X11 protocol or GUI toolkits, but, one can and should rely on work developed for qt, GTK or XIM for system-wide or cross-platform compliance to any proposed set of standards. As the Debian console-setup package clearly illustrates, "there is no need to duplicate or change the keyboard files just to make simple customi[z]ations such as the use of dead keys, the key functioning as AltGr or Compose key, the key(s) to switch between Latin and non-Latin mode, etc." Of course, dynamically modifying those maps could have some advantages (agnostic macros?).
References:
- Re: Unicode character entry
  - From: Joe Smith
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]