Re: [xml] ampersand conversion



On 17.05.2005 15:20, oliverst online de wrote:
Hi,

this code is giving me "error : unterminated entity reference" with
libxml2-2.6.16

xmlNodePtr node = xmlNewNode(NULL, BAD_CAST "test");
if( node ) {
        const char* str = "<>\"'&";
        xmlNodeSetContent(node, BAD_CAST str);
}

I know, that some specific character have to be encoded/quoted, but all
others are already being encoded internally and I would just have to
replace the ampersand by myself. Is this a wanted behavior? Because it
seems a bit inconsistent to me encoding all the others, but not the
ampersand.

I haven't checked the code, so this is without warranty, but this is how I see it. xmlNodeSetContent creates a child node of type text and puts the supplied content in it. It does not parse the supplied content as XML and then inject the result into the tree. For the latter to work in any sane way, the supplied content would have to be at least a valid document fragment.

I cannot use a '>' or a '<' within a text node, so these are reasonably quoted. But I can very well use entity references or character references and expect them to be handled accordingly. Therefore, leaving ampersands alone when quoting things in a text node is only sane, because I would have no chance to insert an entity reference, or a character reference, in the newly created text child otherwise.

For consistency, if at all, libxml could require you to use a

  const char* str = "&lt;&gt;\"'&amp;";

instead of what you have used, I feel this way would be a lot more safe than automagically supplying an &amp; for every ampersand that triggers the unterminated entity reference error.

Ciao,
Igor



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]