Pango and ICU



Hello,

I work at IBM on the ICU project (http://oss.software.ibm.com/icu) ICU is an open-source library designed to assist in i18n tasks; it is compliant with Unicode 3.1.1, supports hundreds of code pages, has extensive Unicode based string support, locale-sensitive collation, and a library which supports OpenType Layout. ICU is released under the X open-source license, which is compatible with the GNU GPL.

I've been looking at the Pango TODO list for tasks which could be accomplished using ICU. Here's what I've found:

* "Improve handling of boundary resolution" - ICU includes the Rule Based Break Iterator, which finds text boundaries using a finite state machine compiled from regular expression-like rules. For languages like Thai, ICU uses a word dictionary to find word and line breaks.

* "Improve shaper and font determination algorithms" - ICU has an interface based on Unicode TR #24 which can map a Unicode code point to a script. I've built a little C++ class on top of this that finds runs of characters in the same script, taking neutral characters into account. It also has some support for bracketing characters. For example in the text "english (GREEK) more english..." it remembers that the "(" was in latin text, and so says that the ")" is latin also. It would be easy to port this to C. (I also have a stand-alone implementation of the code which maps from character codes to scripts...)

* "Consider moving to UCS-4 internally" - ICU's Unicode strings are UTF-16 based, with support for iterating through the string one character at a time, and finding a 32-bit character boundary given an arbitrary code point offset. In general, this works well, with little overhead. For example, in the OpenType code, I handle the surrogate pairs during character-to-glyph mapping and treat the resulting glyph as if it were a ligature formed by the two surrogate code points.

It seems to me that ICU is a good fit for Pango in particular, and maybe for Gnome in general. How should I proceed?

Thanks,
Eric Mader
IBM GCoC - San José
5600 Cottle Rd. M/S 50-2/B11
San Jose, CA 95193




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]