Re: [xml] redicting parts of trees

From: cazic gmx net
To: veillard redhat com
Cc: xml gnome org
Subject: Re: [xml] redicting parts of trees
Date: Mon, 16 May 2005 18:06:05 +0200 (MEST)

Hi,

Von: Daniel Veillard <veillard redhat com>
Datum: Mon, 16 May 2005 10:33:06 -0400

On Mon, May 16, 2005 at 03:47:20PM +0200, cazic gmx net wrote:

Hi,

Obviously it's not enough to say that it is a sketch when Daniel is

around

;-)


  don't get me wrong, I just wanted to provide some comments :-)

:-)

Von: Daniel Veillard <veillard redhat com>
Datum: Mon, 16 May 2005 08:28:27 -0400

On Mon, May 16, 2005 at 02:04:44PM +0200, cazic gmx net wrote:

[...]

  Quick comments on it:

[...]
static int
xmlDOMWrapAdoptNode(void *ctxt, xmlDocPtr sourceDoc, xmlDocPtr

destDoc,

                xmlNodePtr node, xmlNodePtr parent, int unlink)
{


  sourceDoc is redundant, can be extracted from node->doc


OK. node->doc can be NULL if created with xmlNewNode.


  Hum, but if plugged in the doc it should have been updated, the
underlying
question is how permissive the function should be with respoect to the
input
document, I tend to agree with you it should try to cope even with weird
input.


We should have no problems if node->doc is NULL, so true, @sourceDoc is
redundant.

  parent should be optional NULL would be similar to the real DOM

function


@parent is not needed; it's a left-over from xmlStaticCopyNode(), which

used as the starting point.


  I would keep it (as optional) to allow better ns integration if
possible.


Ah, now I seem to understand what you mean: with the originally unintended
@parent, acting as the insertion point in the destination document, you
assumed that a namespace reconciliation would be performed; right?
Yes, for non-DOM based calls, an on-the-fly recon. would spare a full
explicit recon. call.

So it could look like this:
If @node is given:
  - reconciliate namespaces which can be found in @parent's
    self-or-ancestor axis
  - if the branch of @parent has the document-node in the
    self-or-ancestor axis, then create missing ns declarations
    on the document-element, otherwise use "oldNs" of xmlDoc for
    temporary storage of missing declarations

With this we could have a non-redundant mechanism, since missing
declarations will be anchored on the document-element only.

If the user wants such an orphan branch (not connected to the document-node)
to be serialized, he still has the option to call a namespace
reconciliation, which would create the missing ns decls. on the brach-top.

  Error handling should be designed. A simple -1 error code back is

not

really
suitable for the kind of complex operation that is being designed

here.

OK.

        case XML_DOCUMENT_NODE:
        case XML_HTML_DOCUMENT_NODE:


  XML_HTML_DOCUMENT_NODE and XML_DOCUMENT_NODE may not generate an
error...
I could think of a semantic for this, need to be checked against DOM.


Document nodes cannot be adopted as per DOM spec.


  okay, that clear :-)

    sameDict = ((sourceDoc->dict == destDoc->dict) &&
        (destDoc->dict != NULL)) ? 1 : 0;
    cur = node;


   if parent != NULL collect existing inscope namespaces


@parent will not be used.


  I would really keep it if possible.

                if (! sameDict) {



   Wrong you need to check xmlDictOwns(sourceDoc->dict, cur->name)
too or you are gonna leak cur->name if the node was added manually


So we always need a xmlDictOwns check? Can you see a constellation where

we

can avoid this?


  not really, but it's a really fast call, usually it's a call with 2
just 2 pointer comparison.


Good to hear that.

                    if (destDoc->dict)
                        cur->name = xmlDictLookup(destDoc->dict, cur->name, -1);
                    else if (sourceDoc->dict)
                        cur->name = BAD_CAST xmlStrdup(cur->name); 
                    /*
                    * TODO: Are namespace declarations ever in a dict?
                    */


Are dicts ever used for namespace declarations?


  of course, SAX2 always pass namespaces related strings from a the
document
parser dictionary. It's the norm rather than the exception.


OK, should have known that, while working an streamed schemata with SAX2 ;-)

                }
                /*
                * Adopt out-of-scope namespace declarations.
                */
                if (cur->ns != NULL) {
                    int i, j;


  I would rather use a hash table than comparing all namespaces string


Namespace strings are not compared. Only the pointers to xmlNs.


  But at some point you will need to compare strings, for example when
updating that list. There is 2 aspects, speed and code maintainance. I
can't tell about the speed but from a code cleanness POV it seems a more
abstract based view using hashes would make the code less confusing. But
it's all implementations details nothing critical.

                    } else {
                        /*
                        * User-defined behaviour.
                        */


   you can't do that. ctxt need to be refined to be actually useful, a
void * won't work. And adding 2 args might be just a bit too much,

this

need
more thinking


Ctxt will not be void* in the end. We have to design a nice struct for

it.

  okay :-)


   I would really rather use a dictionnary for nsList it would be way
cleaner.
the only problem is that it would require a trick like a function
recursion
when encountering a namespace deactivation like xmlns="" or

xmlns:foo=""

or namespace redefinition to a diferent value but that quite

unfrequent.


xmlns:foo="" is not an allowed namespace declaration as far as I know.


  yes in namespace-1.1


Jesus!

xmlns="" should not be referenced by an node->ns entry, since it is just

machanism to disable the default namespace.


  yes but it's on the nsDef, that's the only way to implement this.

Can you clarify why you want a hash here? The mechanism just assures

that

  see before.

references (node->ns) to the xmlNs entries will be valid, thus picked

from

or created in "oldNs" if the original xmlNs entries are out-of-scope. It

is

not a namespace reconciliation mechanism; it just unlinks the branch and
keeps namespace references alive; nsDef entries are not touched.

            case XML_ENTITY_REF_NODE:
                /*
                * TODO: Remove entity child nodes.
                */
                goto internal_error;
                break;


  forces a recursion see other examples of recursive tree walk with 
entities references. Potentially a lookup of the entity being ref'ed

from the target document. XInclude has a semantic for such entities

remapping might use the same.


OK. The spec wants the referenced entity to be discarded; an entity of

the

destination document will be assigned if available.


  if the spec has a clear behaviour for this then let's follow it :-)



  Hum, I seems to have missed handling XML_ELEMENT_NODE especially the
part handling nsDef on those.


It's the first switch-case :-) Do we have to touch nsDef entires here?


  :-)
  touch no, read yes, I probably missed it. Currently the list based code
is a bit frightening, using dict might improve that aspect, if only in the
design/review phase.


  Obviously lot of thinking and testing need to be carried on. I would
really
like to get something we can finally rely on and not half of

solutions.


Well, who doesn't? No offense: Daniel, if you can conjure a working

solution

out of the box, then just do it. I'd love to see this issues being

solved by

  Hum, there is a misunderstanding :-)
  By half solutions I meant the couple of existing non-complete function
which are already in tree.c and which I pointed to in earlier parts of
this
thread. I was not complaining about your attempt !


Ah, I put you back on the list of my friends then :-)

others, really. You could add the entity refernce stuff and fix the dict
parts, for I don't have the big picture in mind regarding the string

dicts.

We'll surely need some thinking on the context struct, so it won't be
finished and maby should not be finished that fast.


  There is no hurry, I just want to end up with a complete solution this
time
even if it takes a month before everybody agree the resulting code is
usable
for them.

  Thanks a lot for starting the effort though there is obviously some

work

left :-)

Daniel


Proposal for an initial function head and return values:

/**
 * xmlDOMWrapAdoptNode:
 * @ctxt: the optional wrapper context
 * @destDoc: the adopting document
 * @node: the node to be adopted
 *
 * Unlinks and adopts a node.
 *
 * Returns: 0 in case of success
 *          1 if @node cannot be adopted
 *          -1 in case of an API or internal error.
 */
static int
xmlDOMWrapAdoptNode(void *xmlSomeStructNamePtr,
                    xmlDocPtr destDoc,
                    xmlNodePtr node)


Still open at the API level :
  - following your comment sourceDoc might be added to cope with
    node->doc == NULL, but this opens the door to more problem like
    node->doc != NULL and node->doc != sourceDoc :-\


@sourceDoc should not be needed.

  - I still think an optional parent node to be able to make namespace 
    integration, it's incredible how much people have been complaining
    for example in XSLT when cut and pasting trees and extra ns definition
    were generated. Sometimes DTD are defined to allow the namespaces
    only on the root element for example.


Automatic reconciliation as described above, agreed.

  - more complex error handling. Potentially this could be options passed
    to the function on how to process default edge cases.


An option arg would be good, even if we end up not using it.

Regards,

Kasimier

References:
- Re: [xml] redicting parts of trees
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]