Pango and ICU
- From: Eric Mader <mader jtcsv com>
- To: gtk-i18n-list gnome org
- Subject: Pango and ICU
- Date: Wed, 19 Dec 2001 17:04:55 -0800
Hello,
I work at IBM on the ICU project (http://oss.software.ibm.com/icu) ICU is
an open-source library designed to assist in i18n tasks; it is compliant
with Unicode 3.1.1, supports hundreds of code pages, has extensive Unicode
based string support, locale-sensitive collation, and a library which
supports OpenType Layout. ICU is released under the X open-source license,
which is compatible with the GNU GPL.
I've been looking at the Pango TODO list for tasks which could be
accomplished using ICU. Here's what I've found:
* "Improve handling of boundary resolution" - ICU includes the Rule Based
Break Iterator, which finds text boundaries using a finite state machine
compiled from regular expression-like rules. For languages like Thai, ICU
uses a word dictionary to find word and line breaks.
* "Improve shaper and font determination algorithms" - ICU has an interface
based on Unicode TR #24 which can map a Unicode code point to a script.
I've built a little C++ class on top of this that finds runs of characters
in the same script, taking neutral characters into account. It also has
some support for bracketing characters. For example in the text "english
(GREEK) more english..." it remembers that the "(" was in latin text, and
so says that the ")" is latin also. It would be easy to port this to C. (I
also have a stand-alone implementation of the code which maps from
character codes to scripts...)
* "Consider moving to UCS-4 internally" - ICU's Unicode strings are UTF-16
based, with support for iterating through the string one character at a
time, and finding a 32-bit character boundary given an arbitrary code point
offset. In general, this works well, with little overhead. For example, in
the OpenType code, I handle the surrogate pairs during character-to-glyph
mapping and treat the resulting glyph as if it were a ligature formed by
the two surrogate code points.
It seems to me that ICU is a good fit for Pango in particular, and maybe
for Gnome in general. How should I proceed?
Thanks,
Eric Mader
IBM GCoC - San José
5600 Cottle Rd. M/S 50-2/B11
San Jose, CA 95193
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]