Re: Designing a Better Font Selection Widget for use in Open Source Software



Hi, Gora,

On Tuesday 2005.10.04 05:24:23 -0000, Gora Mohanty wrote:
> Please note that I am replying only to gtk-i18n-list which is the
> only one of the list of addressees that I am currently subscribed
> to. Please feel free to forward the message if you should choose
> to.
> 
> On Tue, 27 Sep 2005 Edward H.Trager wrote :
> [...]
> >... regarding a proposal for an improved font selection drop-down
> >widget that would be ideal for use in professional-quality Open
> >Source word processing, desktop publishing, and graphic design 
> >programs such as OpenOffice.org, Gimp, Inkscape, and similar
> >programs.
> 
> Sorry for the delay in the follow-up: I have been travelling, and
> replying to email has not been easy. I like your proposal very much.
> There are a couple more additions for Indic scripts that I would like
> to propose:
>
> 1. Indic scripts should have a sub-classification below the top-level
>    one as you have proposed for Chinese, for example. There are two
>    ways in which this could be done; the first being by the script,
>    and the second by the encoding. Script-based classification
>    might be needed as there are a wide variety of scripts, and people
>    can often read only one or two.

I never intended to give the impression that all Indic fonts would be
lumped in a top-level category called "Indic".  To the contrary, I assumed
that the top-level categories would simply be:

  * Devanagari
  * Bengali
  * Gujarati
  * Gurmukhi
  * Kannada
  * Limbu
  * Malayalam
  * Oriya
  * Sinhala
  * Tamil
  * Telugu
  ... etc. ...
 
A user would only see the script categories for the fonts he or she actually
had installed.  So, if you only had Devanagari and Bengali fonts, you would only
see those two categories.  When you later added a font for another script, as
soon as fontconfig had noticed it, so too would this font selection library also
notice it.

>     Moreover, it is possible for a
>    language, e.g. Punjabi, to have more than one script.

Well, the font categories are script-based, not language-based.  That's why there
is Devanagari, but not Hindi, Marathi, Sanskrit, Sindhi, or any of the other
languages that have at some point in history been written in Devanagari.

>    Classification by encoding is required for similar reasons as for
>    the Chinese scripts: a wide number of fonts are in use where the

NO. This is not correct.

>    Indian language characters are placed in the ASCII or basic Latin
>    areas. Of course, this practice is wrong and should not be
>    encouraged in this day and age, but the fact remains that these
>    legacy fonts dominate over Unicode fonts in number and quality.

While it is true as you say that legacy fonts with Indic glyphs placed
in the ASCII code space outnumber Unicode fonts, I was actually under the
impression that these fonts would be unusable under Linux -- unless they
were updated to include Unicode CMAPs. And anyway, such "ASCII" fonts are
the *wrong* answer.
 
Recently I did some graphic work in Thai (an Indic-derived script) and
wanted to use some legacy commercial fonts.  It only took me about five minutes
per font to convert each font to a Unicode font using the latest version of
FontForge (http://fontforge.sourceforge.net/).  
Of course I cannot redistribute
my "improved" Unicode versions of these legacy fonts.  But if it took me only
50 minutes to convert a set of 10 commercial legacy fonts, then surely the
font vendors can do the same thing.  Conversion of Devanagari and other Indic
script fonts would take a little longer if one wanted to really do a nice job
to create the proper OpenType ligature substitution tables, but it would still
be faster than creating new fonts from scratch.  This is a problem for the
commercial font vendors to solve.  It is not the job of the Open Source community
to support dying legacy standards.
 
Also, please do not misunderstand the reason for sub-classifying the Chinese
fonts: the sub-classification is *not* to be strictly based on encoding, but on whether the
fonts contain simplified, traditional, or both forms of the characters.  That is
why the "Gu Yin" "Ancient Seal Stone Characters" font produced by Han Ding
is properly placed into the "Traditional" sub-category.  This font only contains
*traditional* character glyph forms but actually uses *simplified* code points to access
those traditional glyphs (i.e., the font has a GB-xxxx encoding).  This is technically 
"wrong" but it is a quick-and-dirty
(and, by the way, imperfect) way for people in mainland China to produce texts and
advertisements in traditional characters which are easier for overseas Chinese to read.

If the font classification algorithms were only to look at encodings, the "Gu Yin" font
would be automatically *misclassified* into the "Simplified Chinese" category.  That is
why I think the code to disambiguate the situation with "GB" fonts from the mainland
would have to look something like the following:

   if ( the fontconfig library says the font supports only "simplified" Chinese){
      if ( the Chinese TTF font name contains the character "繁" and does *not* contain the character "简" ){
         /* -- WEIRD CASE FOR MAINLAND FONTS WITH GB ENCODING BUT ACTUALLY HAVING TRADITIONAL GLYPH FORMS -- */
         classify the font into the "Traditional" category
      }else{
         /* -- NORMAL CASE FOR MAINLAND CHINESE FONTS -- */
         classify the font into the "Simplified" category
      }
   }

Some might object that this is a bit heuristic.  It might be.  I would like to see how well it works.
Remember, the user will be allowed to customize the categorization of his or her fonts.  The code is
to provide a reasonable default arrangement.  If font vendors are intentionally "breaking the rules"
in order to get around the problem that mainland Chinese computers only understand the "GB" encoding,
one can convincingly argue that it is not the job of the Open Source community to solve this problem.

> 2. If the encoding is wrong as discussed above, the name of the font
>    becomes meaningless in both English and the Unicode codepoins in
>    the native language. This could probably be addressed by the font
>    alias scheme discussed in your XML schema for the font selection
>    menu.
> 
> Regards,
> Gora



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]