Re: Font priorities
- From: Eric Mader <mader jtcsv com>
- To: Owen Taylor <otaylor redhat com>
- Cc: Keith Packard <keithp keithp com>, Stefan Baums <baums u washington edu>, GTK-I18N Mailing List <gtk-i18n-list gnome org>
- Subject: Re: Font priorities
- Date: Wed, 05 Feb 2003 11:21:45 -0800
At 10:57 AM 2/5/2003, Owen Taylor wrote:
On Wed, 2003-02-05 at 13:50, Eric Mader wrote:
> At 09:13 PM 2/4/2003, Owen Taylor wrote:
> >One possible thing we'll be able to do in future versions of Pango, is
> >that I plan to do script detection, so I'll be able to tell that
> >a run of text is in Devanagari ... if you then had a table mapping
> >from language => script, you could tell that the 'en' language
> >tag _couldn't_ be appropriate for the Hindi text, though you would
> >have no idea what language tag _was_ appropriate.
>
> ICU has code which maps a code point to a script. There is also code to
> identify runs of text all in the same script, taking neutral characters
> into account. For example, Arabic words separated by spaces will be
> returned as a single run.
>
> (The ICU code which maps code points to script uses the general Unicode
> properties mechanism, which pulls in quite a bit of ICU; I have a version
> of this code which uses a table built by an ICU application so you can
> avoid the direct ICU dependencies...)
Yeah - I already have a port of this code to use in Pango :-)
See attachments to:
http://bugzilla.gnome.org/show_bug.cgi?id=91542
What remains to be done is hooking it up to shaper selection.
You rock! A few months ago I was using this code as part of a process of
splitting a paragraph of text into runs of text in the same script,
direction and font, and found that the script run code, as currently
written, interacts strangely with the bidi code: here's a short summary of
what I found:
I was thinking about whether to compute the script runs over the whole
paragraph or for each directional (and font?) run. Whichever way I do it, I
can imagine a case where it will do the wrong thing. First assume that I
find the script runs within each directional run. Given the input text
"english (ARABIC) hindi." The directional runs will be, of course "english
(", "ARABIC" and ") hindi." I'll assign the whole first run to the Latin
script, the whole second run to the Arabic script, and the whole third run
to the Devanagari script. If the text was all left-to-right, I'd assign the
closed paren to the Latin script 'cause the open paren got assigned to
Latin. It seems like a mistake to let the change in directions change what
script characters get assigned to.
So, what happens if I compute the script runs up front, and then intersect
them with the directional runs? This case above would work out like I want.
But what about this simple case: "english ARABIC more english." The script
runs would be "english ", "ARABIC " and "more english." The directional
runs will be "engilsh ", "ARABIC" and " more english." This means that the
space after the word "ARABIC" will be assigned to the Arabic script, even
though Bidi processing said it's a left-to-right space. This doesn't seem
right either...
What this seems to mean is that the simple way that the script run code
assigns scripts to neutral characters isn't good enough. It seems like it
needs to take more than just the raw script ID's into account... maybe it
needs to do Bidi analysis too? Maybe I need a function that does Bidi and
script runs at the same time?
Regard,
Owen
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]