Hyphenation status
- From: Damon Chaplin <damon kendo fsnet co uk>
- To: gtk-i18n-list gnome org
- Subject: Hyphenation status
- Date: 25 Nov 2002 01:07:54 +0000
Hi,
I've been working on code to do hyphenation, hopefully to add to Pango.
My new code is faster than libhnj and groff and uses less memory.
Here's a rough comparison, using the US hyphenation patterns, and on
an 850MHz P3:
Speed in Words/Sec Memory Use
---------------------------------------------------------
groff 310000 140K
libhnj 360000 200K
my code 630000 43K
The TeX code may be a wee bit more efficient, but it is complicated and
I'm not sure about the license. (We may also have problems with the
various licenses in the hyphenation patterns files at some point.)
My code is almost ready for Unicode as well. The main remaining issue is
normalization. I need to:
a) Normalize the words and the hyphenation patterns so that
matching works correctly (i.e. different forms still match), and
b) Convert the resulting hyphenation pattern back to the positions
of the original characters, so we insert hyphens in the right
place.
g_utf8_normalize() is a problem because it is very slow and I have no
way to do (b).
So I'm thinking of writing an optimized normalization function just for
the code ranges that use hyphenation. (We can just ignore other
characters as they won't make any difference.)
I think hyphenation is used for Latin, Greek and Cyrillic characters.
Are there any others?
Anyone else have better ideas to handle normalization?
Damon
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]