RE: [xml] Encoding problems building document from scratch




-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: Sunday, January 13, 2002 9:42 PM
To: Henke, Markus
Cc: 'xml gnome org'
Subject: Re: [xml] Encoding problems building document from scratch


On Fri, Jan 11, 2002 at 01:15:44PM +0100, Henke, Markus wrote:
Hello,

and sorry that i post this again. But i've posted it before 
as a RE: to
my original posting and that's maybe mistakable since it's 
no answer,
rather a follow-up question that includes some debugging information
and a possible solution that seems to work, although i don't know
if it's the right way to handle these things...

  Not the right way the problem is at the very beginning:

What a pity, works up to now... 8)
You've a suspicion in which case it will break?
And what's the correct way to set an encoding for a new document?

/* encode entities for new node content */
char* contentBuff = "Some content < & > aou äöü AOU ÄÖÜ ß üöä ";
tmpBuff = xmlEncodeEntitiesReentrant(docPtr, BAD_CAST 
contentBuff);
...

/* Add node */
xmlNewChild(docPtr, NULL, "aNode", tmpBuff);

  The problem is there. It's not permitted to do this. The tree is
expected to be maintained in UTF8. xmlEncodeEntitiesReentrant() 
will escape "special" chars like '<>&' but will NOT try to handle
charset conversion.

I see! Since charset conversion in libxml is happen during I/O
operations, xmlEncodeEntitiesReentrant() is not seen as an
I/O process (but my usage of it is some kind of...)

  You need to convert first the string to UTF8. There is a function
exported from encoding.h to do IsoLatin1 to UTF8 encoding.

Hum, that's of course an option. But it seems to me like wasting the
comfortabel capabilities of libxml encoding support, and it's quite
probably that i've to deal with non-iso-latin documents in the future.
Wouldn't it be usefull to have (or is there ?) a function to create
new nodes (content) that considers a given (document-) encoding and
invokes the libxml default encoding support (or a registered
extension)?
If so, one could use xmlEncodeEntitiesReentrant() to overwrite the
(now UTF-8 encoded) content of the new node if necessary.

Check http://xmlsoft.org/encoding.html
  for libxml2 encoding support choices and description.

Thanx, BTDT...   8]

Daniel



Ciao, Markus



Mit freundlichen Gruessen - Kind regards
Markus Henke



________________________Addressed by:________________________
 ORDAT GmbH & Co. KG  -  Serversystems / eCom 
 Dipl.-Inf. (FH) Markus Henke  Fon: +49 (641) 7941-0
 Rathenaustr. 1                Fax: +49 (641) 7941-132
 35394 Gießen                  mailto:markus henke ordat com
 See:                          http://www.ordat.com
_____________________________________________________________
              ...this behavior is by design...



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]