Re: Valid UTF-8 text mangled up in GtkLabel

From: Jonathan Ben Avraham <yba tkos co il>
To: Behdad Esfahbod <behdad cs toronto edu>
Cc: gtk-app-devel-list gnome org, gtk-i18n-list gnome org
Subject: Re: Valid UTF-8 text mangled up in GtkLabel
Date: Sat, 6 Aug 2005 21:16:28 +0300 (IDT)

Hi Behdad,

I will indeed try to keep the negative energy to a minumum. I have greatadmiration for your contributions to FOSS in general and to bidi users inparticular and appreciate your patience in considering my dissentingviews.


My point about user selection is as follows.

I write text documents in Hebrew and in English, and in mixed Hebrew andEnglish. In OOo, I can set the paragraph base direction and alignmentaccording to the language community that I happen to be writing for,regardless of the character set that I am typing. In addition tothis user selectability of base direction, the Hebrew version of OOo nowcomes with macro buttons that allow you to embed RTL text within LTRparagraphs and the converse.

I believe that these features are essential, because it is not possible towrite application logic that will correctly guess my intention in everypossible situation. If I start writing in English, how does theapplication know that I intend to write an English paragraph? It could bethat I only intended to use a Hebrew word in an English sentence, a commonpractice here. The converse, an English word heading a Hebrew sentence iseven more common. These situations often arise in technical writing in ITand medicine for example, where we use a word in Latin characters forwhich there is no accepted Hebrew translation.

Gedit does not store the base direction and alignment information in thefile, unlike OOo. This is fine, because gedit is only intended to be atext editor, not a word processor. In gedit, base direction and alignmentare only display issues. However, for data entry or word processorapplications based on Gtk, we need a way to cause the data in a field tobe displayed RTL and right-aligned, even when the text that is displayedin the field starts with "A320". Perhaps there is an easy way to do thisalready that I missed - but without pushing an RLO into the begining ofthe field. Or do you think that inserting RLO's LRO's and PDO's at theapplication level is in fact the solution to this problem?

Regards,

 - yba


On Fri, 5 Aug 2005, Behdad Esfahbod wrote:

On Fri, 5 Aug 2005, Jonathan Ben Avraham wrote:

Hi Behdad, Gaurav,
IMHO this is *the* classic example of misapplied "bidi algorithm", that
is, the heuristic for determining base direction based on the first
strong directional. There is no reason ever to use this heuristic in any
normal GUI application. In almost all Arabic, Farsi and Hebrew apps you
know the base direction from the outset or else you want to be able to set
is specifically.


I use gedit on a daily basis to type text with Persian and
English paragraphs, and I want my Persian paragraphs to be set ar
RTL and aligned to right, and English paragraphs LTR and aligned
to left.  If you have a patch that does this better than the
current code, I would be happy to review it.  I should not have
to do any configuration, no matter what.  It should Just Work.

For example, email addresses are always LTR,


No.  I want my email addresses with Persian name to be set RTL,
something like:

                                                              |
                    <something somewhere org> DOBHAFSE DADHEB |
                                                              |

text editing must always be user-selectable.


Yes, in a broken design, yes.  fortunately GNOME has left that
philosophy behind since around 2000 when the HIG work started.
Users should be able to just use applications, not teach them how
to do tasks.  Teaching computers is what developers should do.

It would be good to implement the same
directional model in Gtk that Sun implemented in Java Swing TextComponent,
though I appreciate that this would require a lot of work.


I don't know what Swing does, though I've heard it turned off
automatic direction handling.  But as far as I'm concerned,
automatic direction is here to stay in GNOME.

The whole point you and other people advertising in the direction
that you do is that applications can be taught to be smart, to
handle _marked-up_ text.  The bidi algorithm currently only
handles plain text, and with my definition, email addresses are
not plain text.  I've already sketched a solution, you are more
than welcome to go and implement that, otherwise, please keep the
negative energy to a minimum:

 http://bugzilla.gnome.org/show_bug.cgi?id=168108

Thanks
behdad

My 2c,

  - yba



On Fri, 5 Aug 2005, Behdad Esfahbod wrote:

On Fri, 5 Aug 2005, Gaurav Jain wrote:

Hi,

I'm trying to set the text in a GtkLabel to a UTF-8 string, which
contains some arabic characters first, followed by my email address in
angle brackets, followed by my name in round brackets.  For e.g., a
sample value is:

X <gaurav somewhere com> (Gaurav Jain)

In the above, 'X' represents a valid sequence of arabic UTF-8
characters.  The problem that I see is that when I run this program
(appended to this mail), the output shown is something like this:

(gaurav somewhere com> (Gaurav Jain> X

Note that the angle and round brackets are all messed up, and that
order of arabic and ascii words is also wrong.


Apparently our milages do vary ;).

Does anyone know WHY this is happening?


Yes, because Arabic is written from right to left, unlike Latin.
And this behavior is part of the Unicode standard.

Just for information, I'm using GTK 2.4.14.  Also,
I was surprised to discover that this works fine with an older version
of GTK (2.0.9).


Right.  Because /bidi/ was not implemented completely in that
version.  This specific part of bidi that is causing problems for
you is called automatic paragraph direction.

Does something special need to be done so I can get
it to work with GTK >= 2.4?


It /is/ working.  If you like to get the behavior similar to the
old one, you need ot insert a U+200E LEFT-TO-RIGHT MARK character
at the beginning of the buffer.

Thanks,
Gaurav


--behdad
http://behdad.org/
_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list gnome org
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list


--behdad
http://behdad.org/


--
 EE 77 7F 30 4A 64 2E C5  83 5F E7 49 A6 82 29 BA    ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
     - yba tkos co il - tel: +972.2.679.5364, http://www.tkos.co.il -

References:
- Valid UTF-8 text mangled up in GtkLabel
  - From: Gaurav Jain
- Re: Valid UTF-8 text mangled up in GtkLabel
  - From: Behdad Esfahbod
- Re: Valid UTF-8 text mangled up in GtkLabel
  - From: Jonathan Ben Avraham
- Re: Valid UTF-8 text mangled up in GtkLabel
  - From: Behdad Esfahbod

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]