Re: [xml] Adjacent text node merging

From: Nikolay Sivov <bunglehead gmail com>
To: "Eric S. Eberhard" <eric vicsmba com>
Cc: xml gnome org
Subject: Re: [xml] Adjacent text node merging
Date: Mon, 06 May 2013 23:49:11 +0400

On 5/6/2013 23:23, Eric S. Eberhard wrote:

I think I am missing something ... I don't think it really "merges" ... it is comparing the address of the node elements, e.g.

if ((parent->last != NULL) && (parent->last->type == XML_TEXT_NODE) &&
     (parent->last->name == cur->name) &&
     (parent->last != cur)) {
                /* this "merge" simply puts the content into parent->last and removes the redundant cur ... because the address of the name as well as the type are the same */
}

No, it's not what it does. 'name' has a special meaning for text nodes, it contains static string address.

So parent->last->name is an address of a string ... and if it is the same as cur->name THEN the content is "merged" -- it is more a check on having the exact same tag in the tree many times (when I say exact same tag I don't mean the tag data or name -- I am talking about the same memory location). If you do as you suggest I would be careful checking for NULL values because if parent->last->name == cur->name (and parent->last != cur_ then cur->name would become NULL when parent->last->name becomes NULL.

In addition if you edit text content of a particular text node in the children list ... yet that content is from the same cur->name I would think other odd problems would happen.

That's not what I meant, what I need is to preserve user modifications to the tree as they were made as long as tree lives. This way it's possible to edit each text node separately for example.

This has come up a few times on this list because people think that adding an empty name is name=NULL (that is a NULL name, name="" is an empty name).

'name' is not a text data for text nodes.

I use xmlNewDocNode which makes new memory for name so the above merge would never happen.

If it was me I'd look at leaving the code alone in libxml2 and then when you need to make this content change use the SAME test as above before calling the standard routines:

cur->name = xmlStrdup(cur->name);

they will have the same value but DIFFERENT pointers and won't trigger the merge.

Or if you only notice after the fact then copy the node -- in non-tricky cases a memcpy will work just fine, and then use the xml Strdup on the name, and then

xmlReplaceNode(oldnode,newnode);

FREE the old node :-) xmlFreeNode(oldnode);

xmlNewDocNode creates an element, not a text node, so this example is irrelevant.

Note I have been stating in previous emails and discussions lately the concept of using one's own wrappers -- never call any of the libxml2 functions directly if you can avoid it (you can carry that too far). This is a perfect example where either your adding of the node or your reading and changing of the content of a node could be well wrapped -- and then just use your wrappers.

Well, sure, you shouldn't have two calls that do the same basically if that's what you mean, using wrappers was never a question here. I'm using libxml2 with wrapping code to expose different set of APIs.

E

On 5/4/2013 12:59 AM, Nikolay Sivov wrote:
I think it's more a question for Daniel, but any help is welcome of course. Libxml2 merges text nodes to a single node when you add text child next to existing text node for example, so at least xmlAddNextSibling, xmlAddPrevSibling and xmlAddChild are doing that. For a project I'm using libxml2 I want all nodes to be preserved as I add them, so for example I can edit text content of particular text node in children list. The question is what could or will potentially break if I'll use my own versions of these tree manipulation calls that do not perform such merging? e.g. does a lib really expect to have only one text node with no text siblings somewhere in the code, or maybe libxslt does?

P.S. attached patch is just to fix a compiler warning I'm seeing with current git builds, obviously completely unrelated to this topic.
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml
  
-- 
Eric S. Eberhard
VICS
2933 W Middle Verde Road
Camp Verde, AZ  86322

928-567-3727  work                      928-301-7537  cell

http://www.vicsmba.com/index.html             (our work)
http://www.vicsmba.com/ourpics/index.html     (fun pictures)

References:
- Re: [xml] Adjacent text node merging
  - From: Eric S. Eberhard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]