CJK Numbering in gnome-doc-utils



Let's first all thank Theppitak for giving me a much-needed
kick in the rear to get internationalized numbering systems
implemented in gnome-doc-utils.  I've now implemented Thai
decimal and alphabetic, Serbian alphabetic, and Ionic.

Now consider these three pages:

[JA] http://en.wikipedia.org/wiki/Japanese_numerals
[ZH] http://en.wikipedia.org/wiki/Chinese_numerals
[CSS] http://www.w3.org/TR/2002/WD-css3-lists-20021107/

These give conflicting accounts of the various ideographic
numbering systems.  [CSS] lists four different systems,
two for Japanese and two for Chinese.  None of those four
use the same characters as described in [JA].  On top of
that, [ZH] paints a complicated tapestry of numbering that
no mere mortal could possibly hope to fathom.

I now have what I believe to be a correct implementation
of what's described in [JA].  Parts of that article are
imprecise, so I'm not certain.  Furthermore, [CSS] says
that the ideographic systems are not defined for numbers
greater than or equal to 10^16, while [JA] has characters
for myriads up to 10^20, meaning it's defined for numbers
up to (but not including) 10^24.

There's mention on [ZH] of using zero characters, while
[JA] indicates that they're not used.  My understanding
is that formal Chinese numbering uses them, even though
they're unnecessary.  However, [ZH] states that without
zero characters, ambiguity can arise:

"Interior zeroes before the unit position (as in 1002)
must be spelt explicitly. The reason for this is that
trailing zeroes (as in 1200) are often omitted as
shorthand, so ambiguity occurs. One zero is sufficient
to resolve the ambiguity."

I don't understand at all.  Using the characters in [JA],
without any zeroes, I get 1002 = åä and 1200 = åäç.
No ambiguity.

As another point, [ZH] indicates that certain systems
demand that you use the one character all over the place,
while others don't.  For instance, 1000 is å, but one
could also write it as äå.

Furthermore, [ZH] also provides characters for 20, 30,
and 40, rather than just prefixing 10 with the correct
digit marker.

Clearly, there are quite a few variations on this system,
and I have no problems supporting every variation that
we need.  What I need to know, however, is exactly which
ideographic systems our translators want to use.  I need
exact definitions of those systems.

I've put together a simple command line utility called
test-numbers.  I'll be putting it into gnome-doc-utils
CVS under the i18n directory.  Translators (and anybody
else) can use it to verify my implementations.  Please,
everybody send me details of the numbering systems you
need, and then double-check my results with the script.

--
Shaun






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]