Re: [xml] OS/390 compatibility (EBCDIC)

On Thu, May 13, 2010 at 7:38 PM, Daniel Veillard <veillard redhat com> wrote:

On Wed, May 12, 2010 at 03:15:02PM +0300, Alon Bar-Lev wrote:
Thank you for your comments.

I found the first problem, the correct code page should be used in
order to interpret header, patch attached.

But libxml does not work, it cannot work in EBCDIC environment.
As it convert the stream into UTF-8, it then tries to parse it using
native literals.

For example:
if (c == 'a')

Will not work, as 'a' is in EBCDIC and it is compared to c which is
UTF-8. Unlike ANSI, the character value is different between UTF-8
(latin1) and EBCDIC.

ÂFujitsu used to have the problem, until they found a compiler
switch to tell the compiler that the source had to be interpreted as
ASCII, and then their problem was solved (this is from memory from
half a decade ago).

I tried to use #pragma convert("ISO8859-1"), and also tried to use
-qconvlit=ISO8859-1 compiler option, but both have too wide effect in
order to solve this.

ÂTriple check your compiler documentation, that's probably there.
I don't understand what you meant by "too wide effect", I don't see
why this would be a problem for compiling libxml2.

Correct solution is to use:
#define UTF8_CHARACTER_A '\x41'
#define UTF8_CHARACTER_GT '\x3c'

And use these in the parsers.

ÂNahh. Correct solution is that any form of text where the encoding
is not made explicit or part of the metadata is broken. C is broken
from this respect.
Âthere is many places too where libxml2 code assumes things like
a...z are stored in alphabetical order etc ... I'm all for portability
but to the limit it doesn't completely penalize maintainability or
code efficiency.

If you do:
if (xml_utf_char == '<') {
   printf("We got <\n");

You need the xml_utf_char to be ASCII and the message to be EBCDIC.
This what I call too wide effect, you see, I can tell compiler to
treat *ALL* literals as ASCII, but it won't work... OK... you can say
that a library does not print messages... What about fopen(filename,

The correct solution is to have constants for characters when you
modify encoding, and not relay on the C source file encoding at all.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]