Re: [xml] Useless function calls in xmlSetProp()?



Daniel Veillard wrote:
On Fri, Jan 25, 2008 at 11:33:05AM +0100, Julien Charbon wrote:
  Hi all,

it's seems that function calls:

  buffer = xmlEncodeEntitiesReentrant(doc, value)
  list = xmlStringGetNodeList(doc, buffer);

can be exactly replaced by a simple:

  list = xmlNewDocText(doc, value);

You will find theses calls in tree.c. More precisely in xmlNewPropInternal() and in xmlSetNsProp(), both called by xmlSetProp().

In fact all that xmlEncodeEntitiesReentrant() does, is exactly undone by xmlStringGetNodeList(). There is any technical/practical/historical reasons to keep these calls in tree.c?

Below a patch that do this replacement on current trunk. [Just to illustrate my concern]. Our application and libxml2's "make tests" are happy with this change.

I don't believe the patch is right because an attribute list of children can be list of text and entities references,
and well your patch reduces it to just the case where you don't have an
entity reference in attribute values. Even if broken parser APIs
like SAX let people believe that attribute values can only be made
of one text node, this is not true from the spec POV and libxml2 which
was designed as an editing toolkit allows maintains entities references
in attribute values.

Thanks for your fast and clear answer, I am totally agree with it, but... With the current implementation, and in this case:

(1) buffer = xmlEncodeEntitiesReentrant(doc, value)
(2) list = xmlStringGetNodeList(doc, buffer);

xmlStringGetNodeList() will always return a list with only one XML_TEXT_NODE element because xmlEncodeEntitiesReentrant() escape all '&' in '&'. In clear, if value is "&myent;", after (1) buffer will be set to "&myent;" and after (2) list will contain only one XML_TEXT_NODE element with its content set to "&myent;".

Thus:
"&myent;" -> (1) -> "&myent;" -> (2) -> "&myent;"

See below a tiny test that focus on that case:

== test-xml-tiny.c ==
#include <stdio.h>
#include <stdlib.h>

#include <libxml/tree.h>

int main(void) {

  LIBXML_TEST_VERSION;

  // Attribute value with entities and '<' '>'
  xmlChar *attributeValue = "&foo; &bar; &amp; <tag> &myent </tag> &&";
  xmlChar *buffer = NULL;
  xmlDocPtr doc = NULL;

  doc = xmlNewDoc(BAD_CAST "1.0");

  buffer = xmlEncodeEntitiesReentrant(doc, attributeValue);
  xmlNodePtr nodeList = xmlStringGetNodeList(doc, buffer);

  if(nodeList->next == NULL)
    printf("Only one element in return of xmlStringGetNodeList\n");

  printf("%s\n", nodeList->content);

  return 0;
}
== test-xml-tiny.c ==

It's give to me with current libxml2 trunk:
$ gcc test-xml-tiny.c -o test-xml-tiny $(xml2-config --cflags) \
  $(xml2-config --libs)
$ ./test-xml-tiny
Only one element in return of xmlStringGetNodeList
&foo; &bar; &amp; <tag> &myent </tag> &&

 No change. Maybe, historically, it was not always the case...

--
Julien



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]