Re: GTK internationalization, right-to-left languages




Nimrod Zimerman <zimerman@earthling.net> writes:

[...]

> I've been doing some thinking of my own regarding the issue of multi-lingual
> applications involving both LTR and RTL. It includes a bit more than output
> routines and fonts, but generally shouldn't be too complicated. I'll try to
> list some of the points. Note that the following is purely my own - I've not
> read any documentation regarding these issues, though I assume some exist,
> hence I do not know if there are better ways to make these things work.
> 
> 1. Fonts rendering.
> 
> This should be, indeed, rather straight forward. 
> I feel that adding support to X itself is a complicated task not quite
> worth the time, especially now, when The Open Group is no longer open.

I think this is a dangerous way to think of things. The only
way X is going to be saved from the Open Group is if people continue
working on it and contribute the results to the XFree86 project.

> Making gtk draw the fonts on its own looks like a better idea, though I do
> not quite understand the consequences as far as X is involved - does it
> matter? A possible way to integrate RTL might be using special font names
> that would be captured by gdk and handled in this higher level.

Well, if X is bypassed entirely, it is going to make text rendering
fairly slow and complicated. (Though there have been some people
who have wanted to add things like anti-aliased TrueType fonts
and colored pixmap fonts into GTK.) The other alternative is
to have GTK do the conversion from the logical ordering to the
on-screen ordering, then pass the string on to X for the 
final rendering.
 
> Arabic's ligatures which are context dependent shouldn't pose too much of
> a problem. As far as I remember, they are rather straight forward. A letter
> can have different form if it appears as the first letter of a word, as a
> letter in the middle of a word, or as the last letter of a word. I believe
> that, with a few exceptions, should be enough.
> Considering I was able to 'master' this myself when I learned Arabic quite
> a few years ago, it should be obvious it couldn't be too complicated (as of
> this moment, I hardly remember how to draw the letters, not to talk about
> whole words...).
> 
> Are there any other difficulties which are specific to a language? What
> other languages are written right to left, aside of Arabic and Hebrew?

Here are two references:

  http://www.opengroup.org/public/pubs/catalog/c616.htm

Fom the OpenGroup... but... it has a fairly good overview of "Complex
Text Lanuages" - i.e., RTL languages and languages with compound
glyphs like Thai and Korean.

  ftp://ds.internic.net/rfc/rfc2070.txt

Describes some of the issues related to BIDI text, and how
they are addressed for HTML documents.
 
> 2. RTL cursor movement.
> 
> Even if we only require RTL writing, there has to be a way to insert
> numerics as usual, LTR. Hence there is no good reason to prevent arbitrary
> switching from RTL to LTR and vice versa (not to mention that entering
> proper LTR English text is common requirement regardless of the main
> language).
> 
> 3. Multi-linguality.
> 
> I feel that an application shouldn't be RTL *or* LTR. Rather, an application
> should have a main language, probably English, as well as secondary
> languages. The user interface should be totally switchable from one
> interface to the other with a menu choice, without breaking a thing.
> The main language shouldn't dictate how text is to be entered (direction and
> language). It should only choose defaults and control general alignments
> of text messages.

Hmmm. The benefits of being able to switch on the fly seem
to be mostly the gee-whiz factor. ("Gee, I picked this menu
item and everything turned into Arabic and all the dialogs
flipped around. Now, what does "English" look like written
in Arabic...")

The ability to enter multi-lingual text is a different matter.  From
my experience, most localized input methods allow a small amount of
multi-lingualism. Text can be entered in the localized script or in
English/ASCII. My current thought is to stick to this degree of
multilingualism for Entries, but to allow, in the Text widget,
language as an extra attribute, as Color and Font are now.

This does mean that where different languages have different glyph
variants for the same Unicode character, the display of GTK widgets
without specific language markup may be incorrect.  For instance,
Chinese Labels in program being viewed by someone with LANG=ja_JP,
will be slightly misdisplayed. But this doesn't seem to be a major
issue.

[...]
 
> As I see it, every component should have a field identifying its behavior.
> If it is 'automatic', it should be automatically flipped if the user
> interface is flipped. It is is 'constant', it shouldn't. There should be
> some parent-child relationship here, because things can get complicated.
> Comments on this would be appreciated, because I consider this the most
> important part of multi-linguality.

As long as flipping on the fly isn't important, one possibility
is simply to make everything flip by default, and have applications
that are more sophisticated check (gtk_global_direction()) and
pack the rare physical UI elements differently in that case.

But a per-widget flag would be easy enough to do. I don't
actually think that parent/child relationships are necessary.
The application designer can figure out whether each element
needs to be flipped or not when they create the interface.

A slightly tricky part is user interfaces that include pixmaps
that should be flipped. (For instance the down and to-the-left
arrow for "OK" that GNOME uses). I think, there, it has to
be up to the application (or up to gnome-stock) to make the
determination. GTK+ shouldn't be in the business of mirroring
pixmaps.
> 
> 4. Multi-lingual strings.
> 
> If multi-linguality is to be considered, all viewable strings should be able
> to represent arbitrary text, including direction changes and special
> characters. Unicode looks like a good candidate, but I do not know too much
> about it.
> How much of gtk would this break?

It depends what you mean by "break". Unicode is going into GTK+ one
way or the other - and yes it is a potentially backwards incompatible
change. But it is one for the better, so I wouldn't call it breakage.

The backwards-incompatilibity comes mostly for, e.g, European
users who expect to be able to put high the high half of
iso-8859-1 into labels and have it displayed correctly.

One possibility is to make the old names for text-dependent
functions convert automatically from the locale-dependent encoding
to UTF-8, and provide new names for UTF-8 functions.

 gtk_button_new_with_label_u ("...");

It would be, in the long term term, to go the other way

 gtk_button_new_with_label_l ("...");

Or, to avoid the profileration of names entirely and require:

 gtk_button_new_with_label (g_l2u ("..."));
 
> 5. Language files.

[ As Tom says, gettext is the right way to go here ] 
 
> All in all, I don't think it is an awfully complicated task. I'm
> just afraid that it could break gtk too much to be useful.  I don't
> know enough about gtk to decide whether this prediction is accurate,
> though.

If it is done with a little care, I think it shouldn't involve
much breakage at all. It won't matter from non-internationalized
programs, and should make things just a bit more difficult
for programs that were previously only internationalized for
roman languages.

I think it is an important issue. Except for Japanese, and to
some extent Chinese, internationalization to non-Roman scripts
really hasn't been addressed in the free-software community,
and I think such support would really be a valuable feature
for GTK+.

Regards,
                                        Owen




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]