Re: [xml] redicting parts of trees

From: Daniel Veillard <veillard redhat com>
To: cazic gmx net
Cc: xml gnome org
Subject: Re: [xml] redicting parts of trees
Date: Mon, 16 May 2005 10:33:06 -0400

On Mon, May 16, 2005 at 03:47:20PM +0200, cazic gmx net wrote:

Hi,

Obviously it's not enough to say that it is a sketch when Daniel is around
;-)


  don't get me wrong, I just wanted to provide some comments :-)

Von: Daniel Veillard <veillard redhat com>
Datum: Mon, 16 May 2005 08:28:27 -0400

On Mon, May 16, 2005 at 02:04:44PM +0200, cazic gmx net wrote:

[...]

  Quick comments on it:

[...]
static int
xmlDOMWrapAdoptNode(void *ctxt, xmlDocPtr sourceDoc, xmlDocPtr destDoc,
          xmlNodePtr node, xmlNodePtr parent, int unlink)
{


  sourceDoc is redundant, can be extracted from node->doc


OK. node->doc can be NULL if created with xmlNewNode.


  Hum, but if plugged in the doc it should have been updated, the underlying
question is how permissive the function should be with respoect to the input
document, I tend to agree with you it should try to cope even with weird
input.

  parent should be optional NULL would be similar to the real DOM function


@parent is not needed; it's a left-over from xmlStaticCopyNode(), which I
used as the starting point.


  I would keep it (as optional) to allow better ns integration if possible.

  Error handling should be designed. A simple -1 error code back is not
really
suitable for the kind of complex operation that is being designed here.

OK.

  case XML_DOCUMENT_NODE:
        case XML_HTML_DOCUMENT_NODE:


  XML_HTML_DOCUMENT_NODE and XML_DOCUMENT_NODE may not generate an
error...
I could think of a semantic for this, need to be checked against DOM.


Document nodes cannot be adopted as per DOM spec.


  okay, that clear :-)

    sameDict = ((sourceDoc->dict == destDoc->dict) &&
  (destDoc->dict != NULL)) ? 1 : 0;
    cur = node;


   if parent != NULL collect existing inscope namespaces


@parent will not be used.


  I would really keep it if possible.

          if (! sameDict) {



   Wrong you need to check xmlDictOwns(sourceDoc->dict, cur->name)
too or you are gonna leak cur->name if the node was added manually


So we always need a xmlDictOwns check? Can you see a constellation where we
can avoid this?


  not really, but it's a really fast call, usually it's a call with 2
just 2 pointer comparison.

              if (destDoc->dict)
                  cur->name = xmlDictLookup(destDoc->dict, cur->name, -1);
              else if (sourceDoc->dict)
                  cur->name = BAD_CAST xmlStrdup(cur->name); 
              /*
              * TODO: Are namespace declarations ever in a dict?
              */


Are dicts ever used for namespace declarations?


  of course, SAX2 always pass namespaces related strings from a the document
parser dictionary. It's the norm rather than the exception.

          }
          /*
          * Adopt out-of-scope namespace declarations.
          */
          if (cur->ns != NULL) {
              int i, j;


  I would rather use a hash table than comparing all namespaces string


Namespace strings are not compared. Only the pointers to xmlNs.


  But at some point you will need to compare strings, for example when
updating that list. There is 2 aspects, speed and code maintainance. I
can't tell about the speed but from a code cleanness POV it seems a more
abstract based view using hashes would make the code less confusing. But
it's all implementations details nothing critical.

              } else {
                  /*
                  * User-defined behaviour.
                  */


   you can't do that. ctxt need to be refined to be actually useful, a
void * won't work. And adding 2 args might be just a bit too much, this
need
more thinking


Ctxt will not be void* in the end. We have to design a nice struct for it.


  okay :-)


   I would really rather use a dictionnary for nsList it would be way
cleaner.
the only problem is that it would require a trick like a function
recursion
when encountering a namespace deactivation like xmlns="" or xmlns:foo=""
or namespace redefinition to a diferent value but that quite unfrequent.


xmlns:foo="" is not an allowed namespace declaration as far as I know.


  yes in namespace-1.1

xmlns="" should not be referenced by an node->ns entry, since it is just a
machanism to disable the default namespace.


  yes but it's on the nsDef, that's the only way to implement this.

Can you clarify why you want a hash here? The mechanism just assures that


  see before.

references (node->ns) to the xmlNs entries will be valid, thus picked from
or created in "oldNs" if the original xmlNs entries are out-of-scope. It is
not a namespace reconciliation mechanism; it just unlinks the branch and
keeps namespace references alive; nsDef entries are not touched.

      case XML_ENTITY_REF_NODE:
          /*
          * TODO: Remove entity child nodes.
          */
          goto internal_error;
          break;


  forces a recursion see other examples of recursive tree walk with 
entities references. Potentially a lookup of the entity being ref'ed

from the target document. XInclude has a semantic for such entities

remapping might use the same.


OK. The spec wants the referenced entity to be discarded; an entity of the
destination document will be assigned if available.


  if the spec has a clear behaviour for this then let's follow it :-)



  Hum, I seems to have missed handling XML_ELEMENT_NODE especially the
part handling nsDef on those.


It's the first switch-case :-) Do we have to touch nsDef entires here?


  :-)
  touch no, read yes, I probably missed it. Currently the list based code
is a bit frightening, using dict might improve that aspect, if only in the
design/review phase.


  Obviously lot of thinking and testing need to be carried on. I would
really
like to get something we can finally rely on and not half of solutions.


Well, who doesn't? No offense: Daniel, if you can conjure a working solution
out of the box, then just do it. I'd love to see this issues being solved by


  Hum, there is a misunderstanding :-)
  By half solutions I meant the couple of existing non-complete function
which are already in tree.c and which I pointed to in earlier parts of this
thread. I was not complaining about your attempt !

others, really. You could add the entity refernce stuff and fix the dict
parts, for I don't have the big picture in mind regarding the string dicts.
We'll surely need some thinking on the context struct, so it won't be
finished and maby should not be finished that fast.


  There is no hurry, I just want to end up with a complete solution this time
even if it takes a month before everybody agree the resulting code is usable
for them.

  Thanks a lot for starting the effort though there is obviously some work
left :-)

Daniel


Proposal for an initial function head and return values:

/**
 * xmlDOMWrapAdoptNode:
 * @ctxt: the optional wrapper context
 * @destDoc: the adopting document
 * @node: the node to be adopted
 *
 * Unlinks and adopts a node.
 *
 * Returns: 0 in case of success
 *          1 if @node cannot be adopted
 *          -1 in case of an API or internal error.
 */
static int
xmlDOMWrapAdoptNode(void *xmlSomeStructNamePtr,
                    xmlDocPtr destDoc,
                    xmlNodePtr node)


Still open at the API level :
  - following your comment sourceDoc might be added to cope with
    node->doc == NULL, but this opens the door to more problem like
    node->doc != NULL and node->doc != sourceDoc :-\
  - I still think an optional parent node to be able to make namespace 
    integration, it's incredible how much people have been complaining
    for example in XSLT when cut and pasting trees and extra ns definition
    were generated. Sometimes DTD are defined to allow the namespaces
    only on the root element for example.
  - more complex error handling. Potentially this could be options passed
    to the function on how to process default edge cases.

 thanks again,

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Follow-Ups:
- Re: [xml] redicting parts of trees
  - From: cazic

References:
- Re: [xml] redicting parts of trees
  - From: Daniel Veillard
- Re: [xml] redicting parts of trees
  - From: cazic

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]