Hello, the following simple test case ================================================================ #include <stdio.h> #include "libxml\parser.h" #include "libxml\parserInternals.h" int main(int argc, char **argv) { xmlDocPtr doc; xmlNodePtr root; xmlChar *str; fprintf(stdout, "%s: using libxml version %s\n\n", "Test xmlNode[Set/Add]Content()", xmlParserVersion); doc = xmlNewDoc(BAD_CAST "1.0"); root = xmlNewNode(NULL, BAD_CAST "root"); xmlDocSetRootElement(doc, root); str = xmlEncodeEntitiesReentrant(doc, BAD_CAST " X&Y "); xmlNodeSetContent(root, str); //str = BAD_CAST " X&Y "; xmlNodeAddContent(root, str); xmlDocDump(stdout, doc); xmlCleanupParser(); exit(0); } ================================================================ Produces the following (unexpected) output (run against several libxml2 versions, including 2.06.26 on WinXP SP2, but this doesn't seem to be a platform issue): ================================================================ Test xmlNode[Set/Add]Content(): using libxml version 20626 <?xml version="1.0"?> <root> X&Y X&amp;Y </root> ================================================================ You'll notice the "double encoded" entity "&amp;". In the actual application, I send all user input through xmlEncodeEntitiesReentrant(), which seemed the proper way (omitting it will result in an error when calling xmlNodeSetContent(), e.g. "error : unterminated entity reference Y" in the test case). I've traced this down to a point where the difference in the both calls seems to arise: xmlNodeSetContent() calls xmlStringGetNodeList() to fetch a list of text nodes from the given user input, which then again uses xmlGetDocEntity() to (re-)replace the encoded entity with the original character. xmlDocDump() obviously converts the ampersand back to it's entity (which is fine). OTOH, xmlNodeAddContent() (via xmlNodeAddContentLen()) calls xmlNewTextLen(), which simply does a xmlStrndup() on the given content to create a new text node. xmlDocDump() now handles the ampersand of the already encoded entity as literal ampersand (not so fine). So, the question: Am I simply wrong in using these calls, or is it really an issue within libxml2? If so, I'd happily provide a patch, but this seems quite a libxml2 internal (I'd fear for side effects), and I'm not sure which behaviour would be to address!? Any advice would be greatly appreciated. Ciao, Markus Mit freundlichen Gruessen - Kind regards Markus Keim ________________________Addressed by:________________________ ORDAT GmbH & Co. KG - Serversystems / eCom Dipl.-Inf. (FH) Markus Keim Fon: +49 (641) 7941-0 Rathenaustr. 1 Fax: +49 (641) 7941-132 35394 Gießen mailto:markus_keim ordat com See: http://www.ordat.com _____________________________________________________________ I love deadlines. I like the whooshing sound they make as they fly by. -- Douglas Adams
Attachment:
addNodeContent.c
Description: addNodeContent.c