[xml] RE: encoding problems using xmlSaveFormatFileTo()
- From: "Henke, Markus" <Markus_Henke ordat com>
- To: "'xml gnome org'" <xml gnome org>
- Subject: [xml] RE: encoding problems using xmlSaveFormatFileTo()
- Date: Wed, 9 Jan 2002 16:45:36 +0100
-----Original Message-----
From: Henke, Markus
Sent: Wednesday, January 09, 2002 11:33 AM
To: 'xml gnome org'
Subject: encoding problems using xmlSaveFormatFileTo()
Hello,
me again... 8)
While testing an application to write XML docs from scratch
using libxml2-2.4.12
i've some troubles regarding the character encoding support of libxml.
I've read the documentation to that subject and the reference
to the functions
that i use and thought i got it so far, since i'm currently
working with iso-8859-1
encoding (that is supported by default).
But now i'm unshure if i got something wrong (it wouldn't be
the first time... 8).
That's what i'm doing:
/* Build a new document */
docPtr = xmlNewDoc(XML_DEFAULT_VERSION);
...
/* encode entities for new node content */
char* contentBuff = "Some content < & > aou äöü AOU ÄÖÜ ß üöä ";
tmpBuff = xmlEncodeEntitiesReentrant(docPtr, BAD_CAST contentBuff);
...
/* Add node */
xmlNewChild(docPtr, NULL, "aNode", tmpBuff);
xmlFree(tmpBuff);
...
/*** create output buffer ***/
outputBufferPtr = xmlOutputBufferCreateIO(writeCallback,
closeCallback, (void*)&fileDesc, NULL);
/* save doc to disk */
xmlSaveFormatFileTo(outputBufferPtr, docPtr, "iso-8859-1", 0);
...
That's what i get:
<?xml version="1.0" encoding="iso-8859-1"?>
<aNode>Some content < & > aou 䶼 AOU Ä-Ü ß Ã¼öä </aNode>
So, some of the german umlaute are encoded correctly, some
not (but differently)
and some as well as (!).
Debugging shows that xmlEncodeEntitiesReentrant() correctly
replaces the
german umlaute with their character references,
That's seems to be not the whole truth. A more exact look shows that the
encoding of
"Some content < & > aou äöü AOU ÄÖÜ ß üöä "
is
"Some content < & > aou 䶼 AOU Ėܠߠüöä
"
and that looks not OK for me.
I've debuged in xmlEncodeEntitiesReentrant() and found that it evaluates
doc->encoding,
which is NULL at the time where i append my child node since the document is
created just now.
So an UTF-8 encoding is assumed and the character 'ä' (#xe4) is encoded as
(3 byte character)
䶼 which breaks the input buffer (skipped two character, next is
BLANK etc.).
I've tried to manually set the encoding of the doc immediately after
creation, before
appending child nodes via
docPtr->encoding = xmlStrdup("iso-8859-1");
and that seems to work.
Is this the correct (resp. the only) way to get a correct encoding for a
document that is
build from scratch?
And if so, wouldn't it be usefull to provide a xmlNewDoc() function that
takes an
encoding as parameter? Or is there already something similar that i've
missed...?
so i've thought about the
xmlCharEncodingHandlerPtr parameter in xmlOutputBufferCreateIO(),
but the library reference manual keeps silent about it...
I've hoped that passing NULL to xmlOutputBufferCreateIO()
would invoke the
libxml default encoding handler!? Is this the mistake or
something else is going wrong?
Thanx for your effort & Ciao, Markus
Mit freundlichen Gruessen - Kind regards
Markus Henke
________________________Addressed by:________________________
ORDAT GmbH & Co. KG - Serversystems / eCom
Dipl.-Inf. (FH) Markus Henke Fon: +49 (641) 7941-0
Rathenaustr. 1 Fax: +49 (641) 7941-132
35394 Gießen mailto:markus henke ordat com
See: http://www.ordat.com
_____________________________________________________________
...this behavior is by desig...
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]