[Gnome-print] Re: [ynakai redhat com: Re: Gnumeric bug 15607]

From: Lauris Kaplinski <lauris helixcode com>
To: ynakai redhat com
Cc: gnome-print helixcode com, Miguel de Icaza <miguel helixcode com>, Jody Goldberg <jgoldberg home com>
Subject: [Gnome-print] Re: [ynakai@redhat.com: Re: Gnumeric bug 15607]
Date: 22 Nov 2000 21:35:49 -0200

Hello!

> > I am not at all clear what this is attempting to do.  This code
> > appears better suited to gnome-print.  The font -> fontset changes
> > are fine.  Supply a change log and I'll apply those.
> 
> That's not enough for gnome-print. Also, there is exactly no effect!!
> 
> PostScript has its original multibyte-aware system, and it only works
> with our traditional encoding such as EUC-JP, JIS, SJIS. It never works
> with UTF8! There is no info, docs, definition  about how to print
> Japanese UTF8 code currently. And in Japan I can't find any printers
> that can print UTF8 postscript.
> 
> Now we start to localize the next Red Hat Linux (7.1 called internally)
> but the first job is to remove whole UTF8 codes from gnumeric, gnome-print
> and other completely. Those become too optimized for UTF8, so we will
> delete all new features to Japanize them. 

No! No! No!

You are horribly mixing 2 totally different concepts here:
1) Text (characters)
2) Representation of text (glyphs)

The confusion comes from simple languages (europaean alphabetic ones, CJK etc),
where you can typeset text of acceptable quality using trivial mapping:

character <-> glyph

So evry of those languages has developed some encoding scheme, to allow
such modest-quality typesetting.

The problem is, that that system does not scale AT ALL neither to more
complex languages, nor to high quality typesetting.
Gnome-print is designed with high-end typography in mind. So we are absolutely
not interested in keeping compatibility with existing trivial character ->
glyph mappings, because those just make using high-end typography non-trivial,
so people end up writing lazy programs, scaling neither to different languages,
nor to good-quality output. Instead glyph mapping is strictly font-specific,
and has to be extracted from font.

OK. There are problem - many of the features are semi-implemented. There are
2 planned methods for outputting text:

1) Lazy one. Input is UTF-8 text, that will be fed through Pango, which
generates best possible glyph representation and then feed to printer,
using font internal encoding.
2) Advanced one. Program is itself responsible of analyzing glyphs available
in font, and generating positioned glyph array (glyphlist), that is fed to
gnome-print, which simply adapts it to current output resolution.

Glyphlists are horribly immature. You can generate glyphlists, and they will
be printed, but the interface need much more polish. There is also no public
interface to font internal encoding.

UTF-8 can do trivial latin/cyrillic/greek mapping. CJK mapping are missing,
because I have had neither CJK type1 fonts nor much knowledge about those
languages.

So, I accept, it does not work acceptably for CJK. But it is not broken,
simply specific parts of whole system have to be implemented.
So if you want to make the system to work with CJK, I would suggest:

Find out, how to extract glyph unicode code points from CJK type1 fonts.
Write corresponding parts trivial unicode <-> glyph mapping for those languages
In programs using gnome-print, just transcode strings to UTF-8 before printing.
If you need interface to distinguish between Chinese / Japanese fonts, we
can implement this in gnome-print.

It is little work and you get EXTENSIBLE system. Sticking with existing
encodings do not add anything, but create bigger confusion in multilanguage
environments.

The other way would be to go with glyphlists, and write specific font
interface, that guarantees font internal mapping to be identical to some
de-facto existing one. Then you can compose glyphlist just using code
values of that encoding.
That needs making glyphlists public now - i.e. we have to think thoroughly,
which features have to be present there, and how.

Just believe me - mixing characters/glyphs is shooting to one's own leg.

About PS:
Of course, PS cannot do anything with UTF-8, because it is format designed
for transferring text. Characters can be encoded from 1-4(6) bytes etc.
To print, you have to translate (not simply transcode) it to unifor-width,
positioned glyph indices. It can be as easy, as imply doing UTF-8->UCS2 +
UCS2->font_native transcoding, but can be much harder.

Lauris


> 
> I can't explain how big and deep the problem is with my poor English, but I'm
> sure we will hate GNOME and all authors when this UTF8 nightmare still continues...
> 
> I append the latest patch I sent to our Japanese GNOME mailing list about this
> problem. 
> 
> http://www.gnome.gr.jp/~nakai/html/postscript.html
> http://www.gnome.gr.jp/~nakai/html/postscript2.html
> 
> are the easiest samples to show Japanese characters in PostScript.
> \377\001 and \377\000 are the escape sequence to change between
> Japanese(strings for Ryumin-Light-EUC-H in this example) and English
> (strings for Times-Roman in this example) for when FMapType is 3 in
> the composite font.
> 
> I hope those patches to help all of you to understand what is really
> happening in your codes...
> 
> ---
> Yukihiro Nakai, Red Hat Japan, Development.

Follow-Ups:
- .jp Localization of apps/libraries. (was "Re: [Gnome-print] Re: [ynakai redhat com: Re: Gnumeric bug 15607]")
  - From: Chema Celorio

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]