Re: [xml] Useless function calls in xmlSetProp()?



On Fri, Jan 25, 2008 at 02:39:22PM +0100, Julien Charbon wrote:
Daniel Veillard wrote:
On Fri, Jan 25, 2008 at 11:33:05AM +0100, Julien Charbon wrote:
  Hi all,

it's seems that function calls:

  buffer = xmlEncodeEntitiesReentrant(doc, value)
  list = xmlStringGetNodeList(doc, buffer);

can be exactly replaced by a simple:

  list = xmlNewDocText(doc, value);

  You will find theses calls in tree.c. More precisely  in 
xmlNewPropInternal() and in xmlSetNsProp(), both called by xmlSetProp().

  In fact all that xmlEncodeEntitiesReentrant() does, is exactly 
undone by xmlStringGetNodeList(). There is any 
technical/practical/historical reasons to keep these calls in tree.c?

  Below a patch that do this replacement on current trunk. [Just  to 
illustrate my concern]. Our application and libxml2's "make tests" are 
happy with this change.

  I don't believe the patch is right because an attribute 
list of children can be list of text and entities references,
and well your patch reduces it to just the case where you don't have an
entity reference in attribute values. Even if broken parser APIs
like SAX let people believe that attribute values can only be made
of one text node, this is not true from the spec POV and libxml2 which
was designed as an editing toolkit allows maintains entities references
in attribute values.

  Thanks for your fast and clear answer, I am totally agree with it, 
but... With the current implementation, and in this case:

(1) buffer = xmlEncodeEntitiesReentrant(doc, value)
(2) list = xmlStringGetNodeList(doc, buffer);

xmlStringGetNodeList() will always return a list with only one 
XML_TEXT_NODE element because xmlEncodeEntitiesReentrant() escape all 
'&' in '&'. In clear, if value is "&myent;", after (1) buffer will 
be set to "&myent;" and after (2) list will contain only one 
XML_TEXT_NODE element with its content set to "&myent;".

Thus:
"&myent;" -> (1) -> "&myent;" -> (2) -> "&myent;"

  argh, right .... I'm afraid the escaping has been added as an afterthought
it was not supposed to be that way, oh well, one can still build the 
complex attrubute values 'by hand' with the help of the API, but I think
somehow we defeated the initial purpose for the xmlStringGetNodeList() call


It's give to me with current libxml2 trunk:
$ gcc test-xml-tiny.c -o test-xml-tiny $(xml2-config --cflags) \
   $(xml2-config --libs)
$ ./test-xml-tiny
Only one element in return of xmlStringGetNodeList
&foo; &bar; &amp; <tag> &myent </tag> &&

  No change. Maybe, historically, it was not always the case...

  Hum, yes. The only other thing that your suggested change would loose
are the error message resulting from the validations occuring in 
xmlEncodeEntitiesReentrant() , problems reported there would go unnoticed
otherwise. Is that still worth the extra complexity or not, I'm not sure.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]