Unicode normalization and text display differences



Hi everybody,

I am trying to build a simple line generation tool for training neural
networks for OCR and everything is working fine except an oddity in
display depending on Unicode normalization, in particular diacritic
placement.

In [0] (text is normalized to NFC) diacritics are placed correctly while
in [1] (text normalized to NFD) diacritics are placed next to the
preceding code point. 

If I understand Unicode correctly there should be no difference in
display and there is a presentation about Pango from 2004 claiming that
there shouldn't be one.

Is this a known issue or expected behavior? Is there some preprocessing
necessary before using pango_layout_set_text()?

All Best,
Ben

[0] http://l.unchti.me/dump/nfc.png
[1] http://l.unchti.me/dump/nfd.png


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]