Unicode normalization and text display differences

From: Benjamin Kiessling <mittagessen l unchti me>
To: gtk-i18n-list gnome org
Subject: Unicode normalization and text display differences
Date: Wed, 6 Jan 2016 15:00:52 +0100

Hi everybody,

I am trying to build a simple line generation tool for training neural
networks for OCR and everything is working fine except an oddity in
display depending on Unicode normalization, in particular diacritic
placement.

In [0] (text is normalized to NFC) diacritics are placed correctly while
in [1] (text normalized to NFD) diacritics are placed next to the
preceding code point. 

If I understand Unicode correctly there should be no difference in
display and there is a presentation about Pango from 2004 claiming that
there shouldn't be one.

Is this a known issue or expected behavior? Is there some preprocessing
necessary before using pango_layout_set_text()?

All Best,
Ben

[0] http://l.unchti.me/dump/nfc.png
[1] http://l.unchti.me/dump/nfd.png

Follow-Ups:
- Re: Unicode normalization and text display differences
  - From: Behdad Esfahbod

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]