Re: Valid UTF-8 text mangled up in GtkLabel



On Fri, 5 Aug 2005, Gaurav Jain wrote:

> Hello Behdad,

Hey,

> Thanks for your quick response.  I did understand what you stated
> w.r.t. to the arabic text coming on the right, but I didn't quite
> understand how the brackets that are coming are "correct" even for
> right-to-left direction.

Ok, I was not clear about that.  The output is correct as far as
following the standard is concerned.  As for getting the desired
behavior, the answer is that your plain text does not convey
enough information to the bidi algorithm to reorder it correctly.
What is happening is:

   ARABIC <email host com> (english name)

The algorithm decides that the paragraph direction is RTL,
because the paragraph starts with RTL text.  This is correct.
However, the algorithm (roughly) looks for /maximal/ substrings
of LTR text, with the exception that neutral characters
surrounded by LTR text is considered LTR.  So the text would be
marked like this:

   ARABIC <email host com> (english name)
   rrrrrrrrlllllllllllllllllllllllllllllr

And then the whole text is inverted, and the maximal LTR subtext
inverted back, so you get what you got, one of each set of
brackets marked RTL, the other LTR.

> I mean, it's not the "direction" of the brackets, it's the fact that
> the wrong brackets are surrounding the text.  Why is it that, for
> example, the round bracket, which was surrounding my name earlier, it
> now at the beginning of the text, at one end of my email address, and
> instead the angle bracket has taken it's original position?
>
> Are you sure that the order of brackets is indeed correct for
> right-to-left languages?  Kindly confirm (would be useful if you could
> point me to some documentation on this bracket behavior for
> right-to-left languages).

To solve the problem, you need to add some markup to the text,
such that the algorithm deduces the following directions:

   ARABIC <email host com> (english name)
   rrrrrrrrllllllllllllllrrrllllllllllllr

This is possible to do automatically by some smart algorithm, or
if you are writing an email client for example, you have enough
information to produce the markup in your code.  Otherwise, if
it's in plain text, you need to add the hints automatically.
Something like:

   ARABIC <[LRE]email host com[PDF]> ([LRE]english name[PDF])

where [LRE] and [PDF] are the Unicode characters U+202A and U+202C.

> Thanks a lot!
> Gaurav


For more information, feel free to subscribe to the bidi mailing
list at http://bidi.info/

Hope it helps
behdad



> On 8/5/05, Behdad Esfahbod <behdad cs toronto edu> wrote:
> > On Fri, 5 Aug 2005, Gaurav Jain wrote:
> >
> > > Hi,
> > >
> > > I'm trying to set the text in a GtkLabel to a UTF-8 string, which
> > > contains some arabic characters first, followed by my email address in
> > > angle brackets, followed by my name in round brackets.  For e.g., a
> > > sample value is:
> > >
> > > X <gaurav somewhere com> (Gaurav Jain)
> > >
> > > In the above, 'X' represents a valid sequence of arabic UTF-8
> > > characters.  The problem that I see is that when I run this program
> > > (appended to this mail), the output shown is something like this:
> > >
> > > (gaurav somewhere com> (Gaurav Jain> X
> > >
> > > Note that the angle and round brackets are all messed up, and that
> > > order of arabic and ascii words is also wrong.
> >
> > Apparently our milages do vary ;).
> >
> >
> > > Does anyone know WHY this is happening?
> >
> > Yes, because Arabic is written from right to left, unlike Latin.
> > And this behavior is part of the Unicode standard.
> >
> > > Just for information, I'm using GTK 2.4.14.  Also,
> > > I was surprised to discover that this works fine with an older version
> > > of GTK (2.0.9).
> >
> > Right.  Because /bidi/ was not implemented completely in that
> > version.  This specific part of bidi that is causing problems for
> > you is called automatic paragraph direction.
> >
> > > Does something special need to be done so I can get
> > > it to work with GTK >= 2.4?
> >
> > It /is/ working.  If you like to get the behavior similar to the
> > old one, you need ot insert a U+200E LEFT-TO-RIGHT MARK character
> > at the beginning of the buffer.
> >
> >
> > > Thanks,
> > > Gaurav
> >
> > --behdad
> > http://behdad.org/
> >
> _______________________________________________
> gtk-i18n-list mailing list
> gtk-i18n-list gnome org
> http://mail.gnome.org/mailman/listinfo/gtk-i18n-list
>
>

--behdad
http://behdad.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]