[xml] Behaviour of xmlNodeAddContent() vs. xmlNodeSetContent()



Hello,

the following simple test case

================================================================
#include <stdio.h>
#include "libxml\parser.h"
#include "libxml\parserInternals.h"

int main(int argc, char **argv) {

        xmlDocPtr doc;
        xmlNodePtr root;
        xmlChar *str;

        fprintf(stdout, "%s: using libxml version %s\n\n",
                "Test xmlNode[Set/Add]Content()", xmlParserVersion);
        
        doc = xmlNewDoc(BAD_CAST "1.0");
        root = xmlNewNode(NULL, BAD_CAST "root");
        xmlDocSetRootElement(doc, root);

        str = xmlEncodeEntitiesReentrant(doc, BAD_CAST " X&Y ");
        xmlNodeSetContent(root, str);

        //str = BAD_CAST " X&Y ";
        xmlNodeAddContent(root, str);
        
        xmlDocDump(stdout, doc);

        xmlCleanupParser();

        exit(0);
}
================================================================


Produces the following (unexpected) output (run against several
libxml2 versions, including 2.06.26 on WinXP SP2, but this
doesn't seem to be a platform issue):

================================================================
Test xmlNode[Set/Add]Content(): using libxml version 20626

<?xml version="1.0"?>
<root> X&amp;Y  X&amp;amp;Y </root>
================================================================


You'll notice the "double encoded" entity "&amp;amp;".
In the actual application, I send all user input through
xmlEncodeEntitiesReentrant(), which seemed the proper way
(omitting it will result in an error when calling
xmlNodeSetContent(), e.g.
"error : unterminated entity reference  Y" in the test case).

I've traced this down to a point where the difference in the both
calls seems to arise:
xmlNodeSetContent() calls xmlStringGetNodeList() to fetch a list
of text nodes from the given user input, which then again
uses xmlGetDocEntity() to (re-)replace the encoded entity with
the original character.
xmlDocDump() obviously converts the ampersand back to it's
entity (which is fine).
OTOH, xmlNodeAddContent() (via xmlNodeAddContentLen()) calls
xmlNewTextLen(), which simply does a xmlStrndup() on the
given content to create a new text node.
xmlDocDump() now handles the ampersand of the already encoded
entity as literal ampersand (not so fine).

So, the question: Am I simply wrong in using these calls,
or is it really an issue within libxml2?
If so, I'd happily provide a patch, but this seems quite a
libxml2 internal (I'd fear for side effects), and I'm not
sure which behaviour would be to address!?
Any advice would be greatly appreciated.


Ciao, Markus



Mit freundlichen Gruessen - Kind regards
Markus Keim



________________________Addressed by:________________________
 ORDAT GmbH & Co. KG  -  Serversystems / eCom 
 Dipl.-Inf. (FH) Markus Keim   Fon: +49 (641) 7941-0
 Rathenaustr. 1                Fax: +49 (641) 7941-132
 35394 Gießen                  mailto:markus_keim ordat com
 See:                          http://www.ordat.com
_____________________________________________________________
I love deadlines. I like the whooshing sound they make as
they fly by.  -- Douglas Adams

Attachment: addNodeContent.c
Description: addNodeContent.c



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]