Re: Hyphenation status



On Mon, 2002-11-25 at 14:59, Owen Taylor wrote:

> If you have ideas about how to write a fast normalization function,
> they should be applied to g_utf8_normalize() 

I've rewritten g_utf8_normalize_wc() so you pass in a buffer and a size,
which avoids the need to do the decomposition step twice. Also, by using
this function directly I can avoid the conversion back to UTF-8 in
g_utf8_normalize(). (I need gunichar values anyway.)

Doing both of these doubles the speed of normalization, so I'm
reasonably happy with the performance. Hyphenation runs at about 340000
words a second, rather than 650000 without normalization. (But this is
when normalizing ASCII, which is a noop. It will be slower for other
languages.)

I've also added code to do the reverse mappings, so I think it handles
normalization now. (I need to test this though.)

Damon




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]