Re: Distinct performance issues with Japanese only on win32 systems

From: "David E. Hollingsworth" <deh curl com>
To: gtk-i18n-list gnome org
Subject: Re: Distinct performance issues with Japanese only on win32 systems
Date: 22 Jul 2009 10:37:23 -0700

Hello,

I have investigated this more.  It does not appear to be a
configuration issue, nor specific to the particular code I was using.
The problem appears to be the way that Pango is using Uniscribe.


If you run gedit for win32 and load a moderate-length (120kB) Japanese
document, it takes several seconds, whereas an English equivalent
document is essentially instantaneous.  For comparison, the Japanese
document loads in under 1s using WordPad or gedit on Linux.

gedit for win32 can be found here:

  http://ftp.gnome.org/pub/gnome/binaries/win32/gedit/

The documents I used were:

  http://www.fastanimals.com/tech/pango/udhr-ja-x10.txt
  http://www.fastanimals.com/tech/pango/udhr-en-x10.txt

If I'm missing something here -- in particular, if someone has an
application that uses Pango on Windows to display CJK texts that's not
suffering from performance problems -- I'd like to hear about it!  It
seems like such problems are unavoidable.


Looking into the basic-win32 module, what's happening is that Pango
appears to use -- when text_is_simple() returns false -- Uniscribe's
full itemize-and-shape algorithm for each Pango item.  I presume it
does this because there's some case where Pango considers something a
single item but Uniscribe breaks it into multiple items, but I don't
know what case that might be.  Anyway, Uniscribe, like Pango, wants to
operate on complete paragraphs, so it wouldn't be surprising if
calling ScriptItemize for each Pango item was slow.

Japanese in particular gets hit hard by this technique because it
makes extensive use of mixed multiple scripts (kana, kanji, and the
ascii characters), so it ends up with a lot of Pango items.
text_is_simple() returns false for kana & kanji characters, because
Uniscribe's ScriptIsComplex() returns S_OK for SIC_COMPLEX for such
characters.

But I haven't been able to figure out the benefit of using Uniscribe
for most CJK texts.  I can think of some cases where Uniscribe might
provide some benefit (vertical substitution, ambiguous-width
characters, combining marks), but it seems like there are many
identifiable situations where Uniscribe isn't adding any benefit.

Anyway, that wouldn't seem to help other languages that do require
Uniscribe but I haven't done performance comparisons to see; perhaps
other language texts don't result in quite as much item fragmentation
as Japanese texts.

  --deh!

-- 
"I've just found the silverware and I'm sticking a fork in that square!" - N.H.

Follow-Ups:
- Re: Distinct performance issues with Japanese only on win32 systems
  - From: Tor Lillqvist

References:
- Distinct performance issues with Japanese only on win32 systems
  - From: David E. Hollingsworth

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]