Status of support for Romanian

The text below is also part of the archive with code and images available at:

Here is the summary of the tests (as of July 18, 2008 on Fedora 9):

== Quickie 1: Test for Unicode code points relevant for Romanian ==

Issues (and suggested fixes):

* All PS1 (non-OTF) fonts need an additional mapping from 021A/B to

  Background: Most PS1 fonts (not OpenType) that ship with Linux distorts
  lack the code points U+21A/B (t with comma below) which is the proper
  code point for Romanian t with comma below. Thankfully, Adobe once
  decided that t with cedilla is not used in any language so the proper
  glyphs are usually present at U+162/3, which used to be the unified code
  point for t cedilla and t comma below before Unicode 3.0. The Uniscribe
  renderer automatically handles this issue by remapping U+21A/B to
  U+162/3 when the former glyphs are missing. Unfortunately,
  Pango/fonconfig doesn't do this, so most new Romanian documents can only
  be displayed with a very narrow font selection.

  Proposed solution: adopt Uniscribe solution; editing the PS1 fonts is
  pointless and would violate the license for the commercial ones.

* Liberation fonts lack "s with comma" completely -- the glyph needs to be
  added to the fonts. Hopefully, they'll be made into OT fonts at the same
  time (see further discussion why this would be good). "T with comma" can
  be copied from U+162/3.

== Quickie 2: Test for localized OT forms (GSUB/latn/ROM/{locl,ccmp}) ==

The Adobe/Linotype/Vista industry standard seems to that activating ROM/locl should map "s with cedilla" to "s with comma". Since in Adobe (Pro) OT fonts U+162/3 is by default mapped to "t with comma", activating this optional mapping for s renders old, pre-Unicode 3.0 Romanian texts with comma below both s and t. Thankfully Pango already handles this!

Funny enough, the MS Uniscribe from XP SP 3 doesn't turn ROM/locl for Romanian locale (or at least I don't know how to convince it).

The OT SIL fonts take a slightly different approach, but work with Pango nonetheless. First, they have proper cedilla variants for both s and t. Second, they don't have a ROM/locl feature, but a ROM/ccmp feature which remaps both cedilla variants to comma-below counterparts. I'm not sure this approach is entirely correct because the OpenType spec on says that ccmp should not be language sensitive. YMMV, I'm no expert on this. []

Issues (and suggested fixes):

* DejaVu fonts (which are already OT), do not have a ROM/locl feature.
  From what I've seen on this is a two line fix in fontlab.

* Pango does not allow an application to set the OT features language
  independently from UI language. For instance, there's currently no way
  to run gedit with the English UI but to force Romanian localized rendering
  for U+015E/F, which is silly because those code points mean nothing in
  English. Hopefully somebody there will grok it. See my comments on the
  Pango bug at

Finally, if this doesn't make any sense to you, there's some further background info on the Wikipedia page for the Romanian alphabet.

Much ado for some commas ;)

P.S. The Linux console fonts also lack U+219-B code points, but few care about the console fonts these days...

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]