Re: [xml] Behaviour of xmlNodeAddContent() vs. xmlNodeSetContent()



On Fri, Oct 27, 2006 at 05:50:15PM +0200, Keim, Markus wrote:
Hi Daniel,

OK, I've updated the function comment of xmlNodeSetContent()
with a note similar to xmlNewDocNode(), which works on the
same level AFAICS, something along

* NOTE: @content is supposed to be a piece of XML CDATA, so it allow entities
*       references, but XML special chars need to be escaped first by using
*       xmlEncodeEntitiesReentrant() resp. xmlEncodeSpecialChars().

Additoinaly, xmlNodeAddContent() has a note on the different behaviour
WRT xmlNodeSetContent() and an explicit hint *not* to call
xmlEncodeEntitiesReentrant().

I would've sent the patch, but while going through the source
of these calls, I noticed another, maybe more serious issue.
The note above (which appears in the docs for several API calls,
e.g. xmlNewDocNode() and xmlNewChild()) is not correct IMHO,
these calls provide no entity support, at least not for arbitrary
input.
As correctly documented, you've to call xmlEncodeEntitiesReentrant(),
resp. xmlEncodeSpecialChars(), since special XML characters has
to be replaced on that level. But a call to these function will
also replace the ampersand of a possible entity reference in the
content buffer.
I've tested it, e.g. calling
xmlNodeSetContent(node, BAD_CAST "&myEnt;")
with a declared entity "myEnt" will create an XML_ENTITY_REF_NODE
child node with name "myEnt" and the declared content, I'd say that's
what's meant with entity support.
But for arbitrary content, we've to call xmlEncodeEntitiesReentrant()
first, and calling
xmlNodeSetContent(node,
 xmlEncodeEntitiesReentrant(BAD_CAST "&myEnt;"))
will result in an XML_TEXT_NODE child node with content "&myEnt;",
this will be serialized to "&myEnt;" if we dump the node to disk!

I first thought that I must be wrong, cause this would be quite
a concern, but I've tested it and get the above mentioned behaviour.

Any thoughts?

  Either your content is already escaped, which should be the case if you
have existing entities references in your strings, and it seems obviour
you should not call for a second escaping, or it is not and in that case
as the documentation explain you should call it.
 If you put yourself in a situation where you have string containing
both complete entities references and single & then no libxml2 API will
both preserve the existing references and escape the & singleton. It's
a matter of layering, either your string is a markup fragment or it is
not. In the first case it should already be escaped, and in the second
you must escape.

  I see no problem here, really,

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]