Arabic Line Spacing Problem



(This message is slightly modified from a reply I made to John Boncek off list:)

On Thursday 2005.04.28 12:09:36 -0500, Boncek, John wrote:
> I am willing to try that, but how do I tell what font is actually getting
> selected?

That's a good question. It would probably be nice to know programmatically: I
don't know off the top of my head how you do it programmatically.  
I guess you could look at the fontconfig, Pango,
and xft APIs.  I'm sure someone else on this list can better answer that question.

I myself just look at the
rasterized output and, since I am already fairly familiar with the glyph shapes
for the fonts of the target languages I'm interested in, I just eyeball which
font is actually being used for a given target script.

> And isn't a main feature of Pango to select a font without the
> developer tying specific fonts to languages?
>

It would be more accurate to say that Pango uses the fontconfig library to
select a font and, yes, one of the services that fontconfig was designed to
provide is font selection and matching.

However, take a look at /etc/fonts/fonts.conf which is the main configuration
file for fontconfig, you might find something like this:

<!--
  Sans-serif faces
-->
<alias>
 <family>Bitstream Vera Sans</family>
 <family>Helvetica</family>
 <family>Arial</family>
 <family>Verdana</family>
 <family>Albany AMT</family>
 <family>Nimbus Sans L</family>
 <family>Luxi Sans</family>
 <family>FreeSans</family>
 <family>Kochi Gothic</family>
 <family>AR PL KaitiM GB</family>
 <family>AR PL KaitiM Big5</family>
 <family>Baekmuk Gulim</family>
 <family>Baekmuk Dotum</family>
 <family>SimSun</family>
 <default>
  <family>sans-serif</family>
 </default>
</alias>

(The above was taken from a SuSE 9.0 box)

None of the listed fonts above was specifically designed for Arabic.

The first choice for Sans serif is Bitstream Vera which is a very clean
looking font, but it definitely does not cover the Arabic blocks of Unicode.
Next we have Microsoft Arial and Verdana:
those Microsoft fonts may well have glyphs for Arabic, but they aren't on my machine.
All of the fonts at the end of the list --Kochi, AR PL*, Baemuk*, and SimSun-- are East Asian
CJK fonts and they too won't have Arabic.  So I am left with Albany, Nimbus, Luxi, and FreeSans.
Of these, my naive guess would be that FreeSans, a pan-Unicode font, has Arabic glyphs, which might
be OK but FreeSans is definitely not of the same quality as something like Microsoft Arial Unicode.

I could (and often do) look at the actual font files using FontForge:

  http://fontforge.sourceforge.net

A quicker approach is to use "fc-list" to list all the fonts that actually have Arabic glyphs like this:

   fc-list :lang=ar

As the man pages for fc-list are totally useless, please refer to Mike Fabian's page
at:

  http://www.suse.de/~mfabian/suse-cjk/fonts-xft-fontconfig.html

... to discover other options
regarding fc-list.  (Of course, fc-list won't tell me whether the glyphs are well-designed for
a complex-layout script like Arabic or Indic scripts.  For example, although FreeSans *might*
be usable for Arabic, it is almost certainly not usable for some of the Indic scripts because of incomplete
ligature and OpenType layout features).

So, the best solution is to download some good fonts for the target script and modify either
/etc/fonts/fonts.conf or your home directory's ~/.fonts.conf so that fontconfig can preferentially
use those for the target script.  Look at slide number 20 from my recent presentation to see
an example of how /etc/fonts/fonts.conf might be modified for better performance in an
internationalized environment:

 http://eyegene.ophthy.med.umich.edu/iuc27/
 http://eyegene.ophthy.med.umich.edu/iuc27/html/img19.html  (this is the web-ized version, slide 20)

In the case of
Arabic, ArabEyes:

 http://arabeyes.org

... maintains the Khotot open source font repository project
and is thus the obvious place to get reasonably good non-commercial Arabic fonts.

See my "Unicode Font Guide for Free/libre Open source operating systems" page for
a fairly comprehensive treatment of fonts for various world scripts:

 http://eyegene.ophthy.med.umich.edu/unicode/fontguide/

In my opinion, the "/etc/fonts/fonts.conf" file that ships with fontconfig, and various modifications
of it that ship with various Linux distributions, should be updated to reflect the recent availability of
more and better Open Source fonts for non-Latin scripts (as documented on
http://eyegene.ophthy.med.umich.edu/unicode/fontguide/).  

I have just checked the latest stable version of
fontconfig (2.3.2) and noticed the following:

   1) "fonts.conf.in"  still lists the "AR PL*" GB and BIG5 CJK fonts instead of taking advantage of
      Arne Götje's new Unicode-ized versions of the legacy Arphic fonts 
      (http://tavi.debian.org.tw/index.php?page=Unifonts).  

   2) The Baekmuk fonts are also still being shown instead
      of taking advantage of the newer "Un" fonts (http://kldp.net/projects/unfonts/).  

   3) "fonts.conf.in" has still not been internationalized for many other scripts.  There are no
      good defaults for Arabic, for example (Don't forget that Arabic is the national language
      of something like 18 countries, and of course Arabic script is used for Farsi and other languages
      in other countries too). 

To the best of my
recollection, both Taiwan Debian's Unicode-ized Arphic fonts and the "Un" Korean fonts were clearly announced
on these Linux internationalization mailing lists.  I'm surprised that fontconfig hasn't been updated to
reflect the availability of newer, better fonts.

 - Edward H. Trager
   Bioinformatics
   Kellogg Eye Center
   University of Michigan
   Ann Arbor, USA





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]