Re: Hyphenation status
- From: Arthit Suriyawongkul <Arthit Suriyawongkul Sun COM>
- To: Damon Chaplin <damon kendo fsnet co uk>
- Cc: gtk-i18n-list gnome org
- Subject: Re: Hyphenation status
- Date: Mon, 25 Nov 2002 12:20:30 +0700
Hi Damon,
Damon Chaplin wrote:
>
> Hi,
>
> I've been working on code to do hyphenation, hopefully to add to Pango.
> My new code is faster than libhnj and groff and uses less memory.
>
> Here's a rough comparison, using the US hyphenation patterns, and on
> an 850MHz P3:
>
> Speed in Words/Sec Memory Use
> ---------------------------------------------------------
> groff 310000 140K
> libhnj 360000 200K
> my code 630000 43K
>
> The TeX code may be a wee bit more efficient, but it is complicated and
> I'm not sure about the license. (We may also have problems with the
> various licenses in the hyphenation patterns files at some point.)
>
> My code is almost ready for Unicode as well. The main remaining issue is
> normalization. I need to:
>
> a) Normalize the words and the hyphenation patterns so that
> matching works correctly (i.e. different forms still match), and
> b) Convert the resulting hyphenation pattern back to the positions
> of the original characters, so we insert hyphens in the right
> place.
>
> g_utf8_normalize() is a problem because it is very slow and I have no
> way to do (b).
>
> So I'm thinking of writing an optimized normalization function just for
> the code ranges that use hyphenation. (We can just ignore other
> characters as they won't make any difference.)
>
> I think hyphenation is used for Latin, Greek and Cyrillic characters.
> Are there any others?
FYI, Thai and many Asian languages also use hyphennation.
(not sure about CJKV)
regards,
Art
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]