Re: Row-cell <-> 'character codes'



Jan wrote:

> I'm quite new to this list, so maybe what I ask is blindingly obvious
> (though I have checked the archives without find an answer).
> 
> How do I convert between the row-cell notation (used eg. in Ken Lunde's
> book 'CJKV ...') and the numerical values that I see in a hexdump of eg.
> a GB18030 encoded text file?
> 
> What I am thinking of is NOT whether there is a program that can do it
> for me - I actually need to be able to look at the values in a hexdump
> and compare them with the entries in the tables in Lunde's book. But he
> doesn't seem to mention this subject other than extremely superficially.
> This is unfortunately useless for me.


I've never read Lunde's book (in fact I don't know anyone doing serious
Chinese apps who has. They seem to be mostly for westerners who want to
learn something about supporting East Asian languages), but aren't they
too old to include GB18030? That was only published last year.

I don't know how Linde presents the codes, but the usual confusion for
other Asian codes, such as CNS, or GB2312, is the specs leave off the
top bit. That is, code ABCD will appear in the specs as 2BCD. This is
because they consider 2BCD to be the actual character code, and the top
bit as a flag to select non-ASCII mode.

Since GB18030 is China's transitional code from the older ones to
Unicode 3.x, I believe there are several ways in which a file can be
encoded - just as there are for Unicode itself. GB18030 isn't widely
used, since it is fairly new and few fonts are available to take
advantage of its wide character coverage. I don't think I have yet
encountered a file encoded with it.

Regards,
Steve





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]