[xslt] xsltproc and UTF-8 multi-byte
- From: Rick Kwan <kwanrj02 lightsaber com>
- To: xslt gnome org
- Cc: Rick Kwan <kwanrj02 lightsaber com>
- Subject: [xslt] xsltproc and UTF-8 multi-byte
- Date: Tue, 26 Nov 2002 10:08:31 -0800
Greetings folks,
I'm trying to use xsltproc on Solaris 8 to transform a XML file of
UTF-8 multi-byte text. In the resulting, instead of UTF-8, I see
numeric character entities which I think are the equivalent Unicode
UCS-2. For example, a line starts with
<para>\343\200\200\343\200\201...<para>
and becomes
<para> 、...</para>
(The source is really entered as six real bytes, not a string of six
escaped octals as shown above. If you cat the file in C locale, that
text is thoroughly unreadable.)
(For those who are multi-byte conversant, these codepoints were
taken from GB-2312 (Simplified Chinese EUC) codeset 0xa1a1 and
0xa1a2. They are the first two multi-byte codepoints.)
I would like to just see the original UTF-8 text (in its transformed
XML, of course). I suspect this ought to be really easy, but I'm
completely missing it. Any suggestion?
Given the encoding of the test sample, I can't show it in this mail.
But I can send it as a tar.gz attachment to anyone who is interested.
I consider it small (30 lines) and reasonably self-documenting.
--Rick Kwan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]