RE: [xml] Encoding problems building document from scratch
- From: "Henke, Markus" <Markus_Henke ordat com>
- To: "'veillard redhat com'" <veillard redhat com>
- Cc: "'xml gnome org'" <xml gnome org>
- Subject: RE: [xml] Encoding problems building document from scratch
- Date: Mon, 14 Jan 2002 16:39:15 +0100
-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: Sunday, January 13, 2002 9:42 PM
To: Henke, Markus
Cc: 'xml gnome org'
Subject: Re: [xml] Encoding problems building document from scratch
On Fri, Jan 11, 2002 at 01:15:44PM +0100, Henke, Markus wrote:
Hello,
and sorry that i post this again. But i've posted it before
as a RE: to
my original posting and that's maybe mistakable since it's
no answer,
rather a follow-up question that includes some debugging information
and a possible solution that seems to work, although i don't know
if it's the right way to handle these things...
Not the right way the problem is at the very beginning:
What a pity, works up to now... 8)
You've a suspicion in which case it will break?
And what's the correct way to set an encoding for a new document?
/* encode entities for new node content */
char* contentBuff = "Some content < & > aou äöü AOU ÄÖÜ ß üöä ";
tmpBuff = xmlEncodeEntitiesReentrant(docPtr, BAD_CAST
contentBuff);
...
/* Add node */
xmlNewChild(docPtr, NULL, "aNode", tmpBuff);
The problem is there. It's not permitted to do this. The tree is
expected to be maintained in UTF8. xmlEncodeEntitiesReentrant()
will escape "special" chars like '<>&' but will NOT try to handle
charset conversion.
I see! Since charset conversion in libxml is happen during I/O
operations, xmlEncodeEntitiesReentrant() is not seen as an
I/O process (but my usage of it is some kind of...)
You need to convert first the string to UTF8. There is a function
exported from encoding.h to do IsoLatin1 to UTF8 encoding.
Hum, that's of course an option. But it seems to me like wasting the
comfortabel capabilities of libxml encoding support, and it's quite
probably that i've to deal with non-iso-latin documents in the future.
Wouldn't it be usefull to have (or is there ?) a function to create
new nodes (content) that considers a given (document-) encoding and
invokes the libxml default encoding support (or a registered
extension)?
If so, one could use xmlEncodeEntitiesReentrant() to overwrite the
(now UTF-8 encoded) content of the new node if necessary.
Check http://xmlsoft.org/encoding.html
for libxml2 encoding support choices and description.
Thanx, BTDT... 8]
Daniel
Ciao, Markus
Mit freundlichen Gruessen - Kind regards
Markus Henke
________________________Addressed by:________________________
ORDAT GmbH & Co. KG - Serversystems / eCom
Dipl.-Inf. (FH) Markus Henke Fon: +49 (641) 7941-0
Rathenaustr. 1 Fax: +49 (641) 7941-132
35394 Gießen mailto:markus henke ordat com
See: http://www.ordat.com
_____________________________________________________________
...this behavior is by design...
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]