Re: Hyphenation status



Hi Damon,

Damon Chaplin wrote:
> 
> Hi,
> 
> I've been working on code to do hyphenation, hopefully to add to Pango.
> My new code is faster than libhnj and groff and uses less memory.
> 
> Here's a rough comparison, using the US hyphenation patterns, and on
> an 850MHz P3:
> 
>               Speed in Words/Sec          Memory Use
>   ---------------------------------------------------------
>   groff             310000                    140K
>   libhnj            360000                    200K
>   my code           630000                     43K
> 
> The TeX code may be a wee bit more efficient, but it is complicated and
> I'm not sure about the license. (We may also have problems with the
> various licenses in the hyphenation patterns files at some point.)
> 
> My code is almost ready for Unicode as well. The main remaining issue is
> normalization. I need to:
> 
>  a) Normalize the words and the hyphenation patterns so that
>     matching works correctly (i.e. different forms still match), and
>  b) Convert the resulting hyphenation pattern back to the positions
>     of the original characters, so we insert hyphens in the right
>     place.
> 
> g_utf8_normalize() is a problem because it is very slow and I have no
> way to do (b).
> 
> So I'm thinking of writing an optimized normalization function just for
> the code ranges that use hyphenation. (We can just ignore other
> characters as they won't make any difference.)
> 
> I think hyphenation is used for Latin, Greek and Cyrillic characters.
> Are there any others?

FYI, Thai and many Asian languages also use hyphennation.
(not sure about CJKV)


regards,
Art



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]