Re: Hyphenation status

From: Arthit Suriyawongkul <Arthit Suriyawongkul Sun COM>
To: Damon Chaplin <damon kendo fsnet co uk>
Cc: gtk-i18n-list gnome org
Subject: Re: Hyphenation status
Date: Mon, 25 Nov 2002 12:20:30 +0700

Hi Damon,

Damon Chaplin wrote:
> 
> Hi,
> 
> I've been working on code to do hyphenation, hopefully to add to Pango.
> My new code is faster than libhnj and groff and uses less memory.
> 
> Here's a rough comparison, using the US hyphenation patterns, and on
> an 850MHz P3:
> 
>               Speed in Words/Sec          Memory Use
>   ---------------------------------------------------------
>   groff             310000                    140K
>   libhnj            360000                    200K
>   my code           630000                     43K
> 
> The TeX code may be a wee bit more efficient, but it is complicated and
> I'm not sure about the license. (We may also have problems with the
> various licenses in the hyphenation patterns files at some point.)
> 
> My code is almost ready for Unicode as well. The main remaining issue is
> normalization. I need to:
> 
>  a) Normalize the words and the hyphenation patterns so that
>     matching works correctly (i.e. different forms still match), and
>  b) Convert the resulting hyphenation pattern back to the positions
>     of the original characters, so we insert hyphens in the right
>     place.
> 
> g_utf8_normalize() is a problem because it is very slow and I have no
> way to do (b).
> 
> So I'm thinking of writing an optimized normalization function just for
> the code ranges that use hyphenation. (We can just ignore other
> characters as they won't make any difference.)
> 
> I think hyphenation is used for Latin, Greek and Cyrillic characters.
> Are there any others?

FYI, Thai and many Asian languages also use hyphennation.
(not sure about CJKV)


regards,
Art

References:
- Hyphenation status
  - From: Damon Chaplin

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]