Re: [xml] redicting parts of trees



Hi,

Obviously it's not enough to say that it is a sketch when Daniel is around
;-)

Von: Daniel Veillard <veillard redhat com>
Datum: Mon, 16 May 2005 08:28:27 -0400

On Mon, May 16, 2005 at 02:04:44PM +0200, cazic gmx net wrote:
[...]
  Quick comments on it:
[...]
static int
xmlDOMWrapAdoptNode(void *ctxt, xmlDocPtr sourceDoc, xmlDocPtr destDoc,
            xmlNodePtr node, xmlNodePtr parent, int unlink)
{

  sourceDoc is redundant, can be extracted from node->doc

OK. node->doc can be NULL if created with xmlNewNode.

  parent should be optional NULL would be similar to the real DOM function

@parent is not needed; it's a left-over from xmlStaticCopyNode(), which I
used as the starting point.

  Error handling should be designed. A simple -1 error code back is not
really
suitable for the kind of complex operation that is being designed here.

OK.

    int ret = 0;
    xmlNodePtr cur, curElem, par;
    xmlNsPtr *nsList = NULL;
    int nbNs, sizeNs, sameDict;
    xmlNsPtr ns;

    if (node == NULL)
    return(-1);
    switch (node->type) {
    case XML_DOCUMENT_NODE:
        case XML_HTML_DOCUMENT_NODE:

  XML_HTML_DOCUMENT_NODE and XML_DOCUMENT_NODE may not generate an
error...
I could think of a semantic for this, need to be checked against DOM.

Document nodes cannot be adopted as per DOM spec.

#ifdef LIBXML_DOCB_ENABLED
        case XML_DOCB_DOCUMENT_NODE:
#endif
        case XML_DOCUMENT_TYPE_NODE:
        case XML_NOTATION_NODE:
        case XML_DTD_NODE:
        case XML_ELEMENT_DECL:
        case XML_ATTRIBUTE_DECL:
        case XML_ENTITY_DECL:
    case XML_ENTITY_NODE:
        return (-1);
    default:
        break;
    }       
    sameDict = ((sourceDoc->dict == destDoc->dict) &&
    (destDoc->dict != NULL)) ? 1 : 0;
    cur = node;

   if parent != NULL collect existing inscope namespaces

@parent will not be used.

    /*
    * TODO: Unlink.
    */    
    while (cur != NULL) {
    switch (cur->type) {
        case XML_ELEMENT_NODE:
            curElem = cur;
            /* No break on purpose. */
        case XML_ATTRIBUTE_NODE:
            /*
            * Adopt the localName.
            */
            if (! sameDict) {


   Wrong you need to check xmlDictOwns(sourceDoc->dict, cur->name)
too or you are gonna leak cur->name if the node was added manually

So we always need a xmlDictOwns check? Can you see a constellation where we
can avoid this?


                if (destDoc->dict)
                    cur->name = xmlDictLookup(destDoc->dict, cur->name, -1);
                else if (sourceDoc->dict)
                    cur->name = BAD_CAST xmlStrdup(cur->name); 
                /*
                * TODO: Are namespace declarations ever in a dict?
                */

Are dicts ever used for namespace declarations?

            }
            /*
            * Adopt out-of-scope namespace declarations.
            */
            if (cur->ns != NULL) {
                int i, j;

  I would rather use a hash table than comparing all namespaces string

Namespace strings are not compared. Only the pointers to xmlNs.

                /*          
                * Did we come across this declaration already?
                */
                if (nsList != NULL) {
                    for (i = 0, j = 0; i < nbNs; i++, j += 2) {
                        if (nsList[j] == cur->ns) {
                        /*
                        * If the entry is NULL, then the ns declaration
                        * is in scope.
                            */
                            if (nsList[++j] != NULL)
                                cur->ns = nsList[j];
                            goto ns_adopt_done;
                        }
                    }
                }
                if (ctxt == NULL) {
                    /*
                    * Default behaviour: lookup if not in scope; if so,
                    * then pick or add a ns decl. using oldNs of xmlDoc.
                    */
                    /*
                    * Is the namespace declaration in scope?
                    */
                    if (curElem != NULL) {
                        par = curElem;
                        do {
                            if ((par->type == XML_ELEMENT_NODE) &&
                                (par->nsDef != NULL)) {
                                ns = par->nsDef;
                                do {
                                    if (ns == cur->ns) {
                                        /*
                                        * In scope; add a mapping.
                                        */
                                        ns = NULL;
                                        goto ns_add_mapping;
                                    }
                                    ns = ns->next;
                                } while (ns != NULL);
                            }
                            par = par->parent;
                        } while (par != node);
                    }
                    /*
                    * No luck, the namespace will be out of scope if the
                    * node is unlinked; anchor it temporarily on the
                    * xmlDoc.
                    */                                              
                    ns = destDoc->oldNs;
                    while (ns != NULL) {
                        if ((((ns->prefix == NULL) &&
                              (cur->ns->prefix == NULL)) ||
                            ((ns->prefix != NULL) &&
                             xmlStrEqual(ns->prefix, cur->ns->prefix))) &&
                            xmlStrEqual(ns->href, cur->ns->href)) {

                            goto ns_add_mapping;
                        }
                        if (ns->next == NULL)
                            break;
                        ns = ns->next;
                    }
                    /*
                    * Again, no luck; add a namespace declaration to oldNs.
                    */
                    if (ns == NULL) {
                        /*
                        * Libxml2 expects the XML namespace to be
                        * in oldNs.
                        */
                        ns = (xmlNsPtr) xmlMalloc(sizeof(xmlNs));
                        if (ns == NULL) {
                            xmlTreeErrMemory(
                                "allocating temporary namespace");
                            goto internal_error;
                        }
                        memset(ns, 0, sizeof(xmlNs));
                        ns->type = XML_LOCAL_NAMESPACE;
                        ns->href = xmlStrdup(XML_XML_NAMESPACE); 
                        ns->prefix = xmlStrdup(
                            (const xmlChar *)"xml");
                        destDoc->oldNs = ns;
                    }
                    ns->next = (xmlNsPtr) xmlMalloc(sizeof(xmlNs));
                    if (ns->next == NULL) {
                        xmlTreeErrMemory(
                            "allocating temporary namespace");
                        goto internal_error;
                    }
                    ns = ns->next;

                    memset(ns, 0, sizeof(xmlNs));
                    ns->type = XML_LOCAL_NAMESPACE; 
                    if (cur->ns->prefix != NULL)
                        ns->prefix = xmlStrdup(cur->ns->prefix);
                    ns->href = xmlStrdup(cur->ns->href);
                } else {
                    /*
                    * User-defined behaviour.
                    */

   you can't do that. ctxt need to be refined to be actually useful, a
void * won't work. And adding 2 args might be just a bit too much, this
need
more thinking

Ctxt will not be void* in the end. We have to design a nice struct for it.

#if 0
                    ctxt->aquireNsDecl(ctxt, cur->ns, &ns);
#endif
                }
                
ns_add_mapping:
                if (nsList == NULL) {
                    nsList = (xmlNsPtr *) xmlMalloc(10 *
                        sizeof(xmlNsPtr));
                    if (nsList == NULL) {
                        xmlTreeErrMemory(
                            "allocating namespace map");
                        goto internal_error;
                    }
                    nbNs = 0;
                    sizeNs = 5;
                } else if (nbNs >= sizeNs) {
                    sizeNs *= 2;
                    nsList = (xmlNsPtr *) xmlRealloc(nsList,
                        sizeNs * 2 * sizeof(xmlNsPtr));
                    if (nsList == NULL) {
                        xmlTreeErrMemory(
                            "re-allocating namespace map");
                        goto internal_error;
                    }
                }
                nsList[nbNs *2] = cur->ns;
                nsList[nbNs *2 +1] = ns;
                nbNs++;
                if (ns != NULL)
                    cur->ns = ns;
            }

   I would really rather use a dictionnary for nsList it would be way
cleaner.
the only problem is that it would require a trick like a function
recursion
when encountering a namespace deactivation like xmlns="" or xmlns:foo=""
or namespace redefinition to a diferent value but that quite unfrequent.

xmlns:foo="" is not an allowed namespace declaration as far as I know.
xmlns="" should not be referenced by an node->ns entry, since it is just a
machanism to disable the default namespace.

Can you clarify why you want a hash here? The mechanism just assures that
references (node->ns) to the xmlNs entries will be valid, thus picked from
or created in "oldNs" if the original xmlNs entries are out-of-scope. It is
not a namespace reconciliation mechanism; it just unlinks the branch and
keeps namespace references alive; nsDef entries are not touched.

ns_adopt_done:
            cur->doc = destDoc;
            if (cur->type == XML_ELEMENT_NODE) {
                cur->psvi = NULL;
                cur->line = 0;
                cur->extra = 0;
                /*
                * Attributes.
                */
                if (cur->properties != NULL) {
                    cur = (xmlNodePtr) cur->properties;
                    continue;
                }
            } else {
                ((xmlAttrPtr) cur)->atype = 0;
                ((xmlAttrPtr) cur)->psvi = 0;
            }

            break;
        case XML_TEXT_NODE:
        case XML_CDATA_SECTION_NODE:
            /*
            * TODO: When to adopt the content?
            */

   use xmlDictOwn to check !

            goto internal_error;
            break;
        case XML_XINCLUDE_START:
        case XML_XINCLUDE_END:
            /* TODO */
            goto internal_error;
            break;

    should not generate an error but be ingnored instead

        case XML_ENTITY_REF_NODE:
            /*
            * TODO: Remove entity child nodes.
            */
            goto internal_error;
            break;

  forces a recursion see other examples of recursive tree walk with 
entities references. Potentially a lookup of the entity being ref'ed
from the target document. XInclude has a semantic for such entities 
remapping might use the same.

OK. The spec wants the referenced entity to be discarded; an entity of the
destination document will be assigned if available.

        case XML_ENTITY_NODE:
        case XML_NOTATION_NODE:
            /*
            * TODO: Remove those nodes.
            */
            goto internal_error;
            break;      
        case XML_PI_NODE:
        case XML_COMMENT_NODE:

Are dicts used with those nodes?

            /*
            * TODO: Adopt something?
            */
            goto internal_error;
            break;

        case XML_DOCUMENT_FRAG_NODE:
            break;
        default:
            break;


  Hum, I seems to have missed handling XML_ELEMENT_NODE especially the
part handling nsDef on those.

It's the first switch-case :-) Do we have to touch nsDef entires here?

    }
    /*
    * Walk the brach.
    */
    if (cur->children != NULL) {
        cur = cur->children;
        continue;
    }

next_sibling:
    if (cur == node)
        break;
    if (cur->next != NULL)
        cur = cur->next;
    else {
        cur = cur->parent;
        goto next_sibling;
    }
    }

    return (ret);

internal_error:
    if (nsList != NULL)
    xmlFree(nsList);
    return (-1);
}

  Obviously lot of thinking and testing need to be carried on. I would
really
like to get something we can finally rely on and not half of solutions.

Well, who doesn't? No offense: Daniel, if you can conjure a working solution
out of the box, then just do it. I'd love to see this issues being solved by
others, really. You could add the entity refernce stuff and fix the dict
parts, for I don't have the big picture in mind regarding the string dicts.
We'll surely need some thinking on the context struct, so it won't be
finished and maby should not be finished that fast.

  Thanks a lot for starting the effort though there is obviously some work
left :-)

Daniel

Proposal for an initial function head and return values:

/**
 * xmlDOMWrapAdoptNode:
 * @ctxt: the optional wrapper context
 * @destDoc: the adopting document
 * @node: the node to be adopted
 *
 * Unlinks and adopts a node.
 *
 * Returns: 0 in case of success
 *          1 if @node cannot be adopted
 *          -1 in case of an API or internal error.
 */
static int
xmlDOMWrapAdoptNode(void *xmlSomeStructNamePtr,
                    xmlDocPtr destDoc,
                    xmlNodePtr node)


Greetings,

Kasimier



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]