Re: [xml] Adjacent text node merging

I think I am missing something ... I don't think it really "merges" ... it is comparing the address of the node elements, e.g.

 if ((parent->last != NULL) && (parent->last->type == XML_TEXT_NODE) && 
     (parent->last->name == cur->name) &&                               
     (parent->last != cur)) {                                           
                /* this "merge" simply puts the content into  parent->last and removes the redundant cur ... because the address of the name as well as the type are the same  */

So parent->last->name is an address of a string ... and if it is the same as cur->name THEN the content is "merged"  -- it is more a check on having the exact same tag in the tree many times (when I say exact same tag I don't mean the tag data or name -- I am talking about the same memory location).  If you do as you suggest I would be careful checking for NULL values because if parent->last->name == cur->name (and parent->last != cur_ then cur->name would become NULL when parent->last->name becomes NULL.

In addition if you edit text content of a particular text node in the children list ... yet that content is from the same cur->name I would think other odd problems would happen.

This has come up a few times on this list because people think that adding an empty name is name=NULL  (that is a NULL name, name="" is an empty name).

I use xmlNewDocNode which makes new memory for name so the above merge would never happen.

If it was me I'd look at leaving the code alone in libxml2 and then when you need to make this content change use the SAME test as above before calling the standard routines:

cur->name = xmlStrdup(cur->name);

they will have the same value but DIFFERENT pointers and won't trigger the merge.

Or if you only notice after the fact then copy the node -- in non-tricky cases a memcpy will work just fine, and then use the xml Strdup on the name, and then


FREE the old node :-)  xmlFreeNode(oldnode);

Note I have been stating in previous emails and discussions lately the concept of using one's own wrappers -- never call any of the libxml2 functions directly if you can avoid it (you can carry that too far).  This is a perfect example where either your adding of the node or your reading and changing of the content of a node could be well wrapped -- and then just use your wrappers.


On 5/4/2013 12:59 AM, Nikolay Sivov wrote:
I think it's more a question for Daniel, but any help is welcome of course. Libxml2 merges text nodes to a single node when you add text child next to existing text node for example, so at least xmlAddNextSibling, xmlAddPrevSibling and xmlAddChild are doing that. For a project I'm using libxml2 I want all nodes to be preserved as I add them, so for example I can edit text content of particular text node in children list. The question is what could or will potentially break if I'll use my own versions of these tree manipulation calls that do not perform such merging? e.g. does a lib really expect to have only one text node with no text siblings somewhere in the code, or maybe libxslt does?

P.S. attached patch is just to fix a compiler warning I'm seeing with current git builds, obviously completely unrelated to this topic.
_______________________________________________ xml mailing list, project page xml gnome org

Eric S. Eberhard
2933 W Middle Verde Road
Camp Verde, AZ  86322

928-567-3727  work                      928-301-7537  cell             (our work)     (fun pictures)

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]