Re: Font priorities

From: Eric Mader <mader jtcsv com>
To: Owen Taylor <otaylor redhat com>
Cc: Keith Packard <keithp keithp com>, Stefan Baums <baums u washington edu>, GTK-I18N Mailing List <gtk-i18n-list gnome org>
Subject: Re: Font priorities
Date: Wed, 05 Feb 2003 11:21:45 -0800

At 10:57 AM 2/5/2003, Owen Taylor wrote:

On Wed, 2003-02-05 at 13:50, Eric Mader wrote:
> At 09:13 PM 2/4/2003, Owen Taylor wrote:
> >One possible thing we'll be able to do in future versions of Pango, is
> >that I plan to do script detection, so I'll be able to tell that
> >a run of text is in Devanagari ... if you then had a table mapping
> >from language => script, you could tell that the 'en' language
> >tag _couldn't_ be appropriate for the Hindi text, though you would
> >have no idea what language tag _was_ appropriate.
>
> ICU has code which maps a code point to a script. There is also code to
> identify runs of text all in the same script, taking neutral characters
> into account. For example, Arabic words separated by spaces will be
> returned as a single run.
>
> (The ICU code which maps code points to script uses the general Unicode
> properties mechanism, which pulls in quite a bit of ICU; I have a version
> of this code which uses a table built by an ICU application so you can
> avoid the direct ICU dependencies...)

Yeah - I already have a port of this code to use in Pango :-)

See attachments to:

  http://bugzilla.gnome.org/show_bug.cgi?id=91542

What remains to be done is hooking it up to shaper selection.

You rock! A few months ago I was using this code as part of a process ofsplitting a paragraph of text into runs of text in the same script,direction and font, and found that the script run code, as currentlywritten, interacts strangely with the bidi code: here's a short summary ofwhat I found:

I was thinking about whether to compute the script runs over the wholeparagraph or for each directional (and font?) run. Whichever way I do it, Ican imagine a case where it will do the wrong thing. First assume that Ifind the script runs within each directional run. Given the input text"english (ARABIC) hindi." The directional runs will be, of course "english(", "ARABIC" and ") hindi." I'll assign the whole first run to the Latinscript, the whole second run to the Arabic script, and the whole third runto the Devanagari script. If the text was all left-to-right, I'd assign theclosed paren to the Latin script 'cause the open paren got assigned toLatin. It seems like a mistake to let the change in directions change whatscript characters get assigned to.

So, what happens if I compute the script runs up front, and then intersectthem with the directional runs? This case above would work out like I want.But what about this simple case: "english ARABIC more english." The scriptruns would be "english ", "ARABIC " and "more english." The directionalruns will be "engilsh ", "ARABIC" and " more english." This means that thespace after the word "ARABIC" will be assigned to the Arabic script, eventhough Bidi processing said it's a left-to-right space. This doesn't seemright either...

What this seems to mean is that the simple way that the script run codeassigns scripts to neutral characters isn't good enough. It seems like itneeds to take more than just the raw script ID's into account... maybe itneeds to do Bidi analysis too? Maybe I need a function that does Bidi andscript runs at the same time?

Regard,
                                            Owen

References:
- Re: Font priorities
  - From: Eric Mader
- Re: Font priorities
  - From: Keith Packard
- Re: Font priorities
  - From: Owen Taylor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]