Re: Comprehensive East-Asian support



Owen Taylor wrote:

> Steve Underwood <steveu@coppice.org> writes:
>
> > Hi all,
> >
> >[...] Last summer I exchanged several private e-mails with Owen Taylor about
> > gscript. There was nothing private about the discussion - I just failed
> > to join this list, despite sending several subscribe requests. At that
> > time Owen's goals for gscript were somewhat narrow - basically just to
> > display text within the GTK display environment. I said that I thought
> > it important for gscript to provide support for vertical and
> > right-to-left text. Owen didn't see that as important for dialogs, and
> > similar GTK related things, which may well have been the correct
> > attitude in the context of July 1999's gscript. January 2000's Pango is
> > intended as a system for the comprehensive handling of international
> > high quality text output, so its time for me to restate my point of
> > view.
>
> Actually, at that point, I was quite willing to consider eventually
> dealing with vertical layout, but didn't consider it a top priority.
>
> That is still the case. For the work I'm doing, GTK+ is the first
> priority to get working for Pango-1.0 ...  simply because that's the
> project most important to me. If other people want to help out in
> getting Pango ready for DTP type applications sooner, well that
> will be appreciated.

Last summer I had little time to spare. Now I hope I might be able to contribute.

> But it should be noted that Pango is _not_ a desktop-publishing system;
> it is a system for handling the hard part of i18n layout. And that means,
> it primarily deals with 1-D layout. Whether those lines are horizontally
> or vertically oriented has some effect on Pango - it may need to
> make different decisions based on that. But the high-level arrangement
> of the lines is a matter for drivers that sit on top of Pango.

Yes, orientation has a number of subtle effects on layout, and Pango would need
to deal with those to achieve good quality results. Beyond that, Pango could
behave as if vertical text is actually horizontal, and leave some post-processing
to deal with the rotation. However, what happens if I want to label a graph line
at 45 degrees in a GTK+ app. What do you expect the text rendering to do? Do you
expect to render for horizontal text, and rotate the result? The quality would be
much better if the rotation is allowed for during the rendering process.

> > A system which can only handle the text direction features defined in
> > the Unicode support tables, bi-directional algorithm, etc. is too
> > limited for East Asian languages. Good Chinese and Japanese (I can't
> > speak for Korean, as I don't see that much of it) support demands
> > support for top-to-bottom, right-to-left text. Chinese also demands
> > options for left-to-right and right-to-left (I'm not sure if Japanese is
> > ever written right-to-left, now or in the past). The Unicode tables
> > assume all East Asian languages are only ever written left-to-right.
>
> This is just inaccurate and I'm not sure that you actually understand
> the purposes of the directional properties within Unicode. They are
> meant to enable the automatic handling of mixed right-to-left and
> left-to-right languages. There would be no point in putting properties
> on a character saying:
>
>  "this character can be written left-to-right, right-to-left or top-to-bottom"
>
> What advantages could such a marking give?

What I said is not inaccurate at all. What I expect of a comprehensive table of
character characteristics is a default behaviour (say left-to-right for CJK), and
one or more possible alternatives. This will be needed, whether it comes from the
Unicode group, or is added by others. With such tables you can layout in
"predominantly left-to-right" mode, or "predominantly right-to-left" mode, or
"predominantly top-to-bottom" mode. The tables will then guide the behaviour of
the rendering system sensibly. The existing Unicode tables are merely sufficient
to ensure that readable output is normally produced. Asian languages have
standard practices for inserting short bursts of other scripts. Its very common
to see arabic numbers, or Latin script inserted into Chinese. If I look around I
can probably find some examples of cyrillic, greek, and so on inserted into
vertical Chinese. A flexible system should be able to produce output that looks
sensible (and hopefully follows common practice) for any text produced with any
predominant orientation.

> > As well as the need to properly present blocks of text, the behaviour of
> > international systems when labelling the y-axis of a graph, and similar
> > rotated text situations is important. Rendering according to the Unicode
> > rules, and rotating the outcome gives the wrong result for East Asian
> > languages. Kanji/Hanzi are _never_ tilted beyond about 45 degrees when
> > labelling anything. Turning them on their side, along with some numbers
> > or English words they may be mixed with would be totally wrong.
> >
> > Even for left-to-right Chinese and Japanese, the Unicode material, and
> > the current gscript/pango, fail to implement things Asian people would
> > consider important in high quality output. For example, if a short burst
> > of Hanzi is dropped into a page of English it would be rendered just as
> > gscript does now. If a short burst of English (say a company name) were
> > dropped into a page of Chinese, space would be placed around the English
> > so the Chinese characters all sit on a mon-spaced grid. Pango doesn't
> > seem to provide for that, and it leads to very odd looking results.
>
> It should be noted that frequently in Japanese usage this is not done,
> though I believe Chinese usage may be different. If there are portions
> within a line that are in latin script, this simply disrupts the character
> grid. Again, there is wide variety of practice here that cannot be
> solved in an automated way.

Its not always done in Chinese, either, and the result can look messy. There are
some constraints caused by punctuation, even for pure Chinese text. For example,
some people don't like to start a column or line with a comma or full stop, so
they fudge the character spacing, and work the punctuation into the previous
column or row. People will almost always do with for a column or row which would
otherwise begin with an ending quotation mark. The result is messy, though. In
the example scans which Mitsuru Oka put up yesterday for us, the text does seem
to be strictly monospaced. From what you are saying, this is not always the case,
so I guess Japanese and Chinese typesetting practices are pretty similar. In fact
I presume the typesetting equipment used for both languages comes from the same
sources, and to some extent modern typesetting style has been dictated by these
suppliers.

> > If facilities for rotated text for labelling purposes, and alternative
> > directions for text blocks are not put into the Pango API I think they
> > are going to get hacked in later by some folk in Asian - possibly with
> > me amongst them. It seems better to include the proper provision for
> > comprehensive text handling from day one. The code that needs to lie
> > behind the API could be missing from version 1 of Pango, but I strongly
> > feel the API needs to make provision for these things, or an
> > incompatible update will occur later.
> >
> > If folk are interested in providing this capability, I will be happy to
> > document (which basically means gather illustrative examples) the
> > detailed behaviour East Asian people have come to expect of mixed
> > language handling in vertical and rotated text. I might also have some
> > time to help in the implementation.
> >
> > Last summer Owen seemed convinced that people don't use vertical or
> > right-to-left text any more. That puzzled me, since I know he can read
> > Japanese. If people aren't convinced, I can post a few small scanned
> > images of Hong Kong and Japanese newspapers and magazines to prove that
> > vertical is still the order of the day in Asia. I can pick almost any
> > newspaper and any page, and show vertically written text.
>
> Just to confirm that that I wasn't forgetting something, I looked back
> through the emails we exchanged on the subject. And I certainly never
> indicated that I believed that vertical writing was uncommon. I'm
> quite familiar with the usage of vertical writing and the various
> ways that vertical and horizontal writing are mixed.
>
> (If you paraphrase somebody's comments, there is a certain obligation
> to do it accurately.)

After re-reading our previous posts I see you are right. What you actually said
was you understood right-to-left was completely obsolete in Asia, and that native
Asian language speakers you talked to thought the Unicode tables did a good job.
My apologies.

> I perhaps was not familiar at that time with the extent that
> right-to-left writing is used with Hong Kong and Taiwan, but
> in any case, I think the basic attitude that I expressed at that
> point is accurate - there are a lot of different ways of handling
> directionality with CJK text, and I don't think Pango can do
> more than enable higher-level DTP programs to handle these cases.

Actually, Hong Kong doesn't use right-to-left very much, and I'm not sure about
Taiwan. The main users of right-to-left are in the PRC. I agree that Pango should
not even attempt to do page layout. I think it should aim to do comprehensive
text block layout, though. If it doesn't, some other rather similar tool will be
needed for this role, duplicating effort. Pango is a toolkit, and I think its one
that should be usable by any higher level text handling program, DTP oriented or
otherwise. There is no way Pango on its own could fulfill the DTP role. If you
look at a typical HK newspaper, within one page half the articles are
left-to-right and half are top-to-bottom. That certainly needs higher level
software to organise the text blocks.

I'm not trying to start a war. I just want to see computers provide better
Chinese (and any other language) facilities than the minimal support Microsoft
gives. There is no hope the MS will ever do more than the minimum they can get
away with (actually, with simplified Chinese W98 they don't even give PRC people
the minimum they need), so our hope must lie in doing things for ourselves in
projects like Pango.

I think its totally unimportant whether Pango V1.0 supports comprehensive
orientations. It may even be a bad thing, delaying the availability of a usable
package. At this stage I simply think careful consideration of the API is in
order, so it doesn't require modification later. That may be as simple as
including "predominant direction", and "display angle" parameters which are
either ignored, or return an error when non-zero, at this stage.

Steve




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]