[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] redicting parts of trees



Hi,

> Von: Daniel Veillard <veillard redhat com>
> Datum: Mon, 16 May 2005 10:33:06 -0400
> 
> On Mon, May 16, 2005 at 03:47:20PM +0200, cazic gmx net wrote:
> > Hi,
> > 
> > Obviously it's not enough to say that it is a sketch when Daniel is
> around
> > ;-)
> 
>   don't get me wrong, I just wanted to provide some comments :-)

:-)

> > > Von: Daniel Veillard <veillard redhat com>
> > > Datum: Mon, 16 May 2005 08:28:27 -0400
> > > 
> > > On Mon, May 16, 2005 at 02:04:44PM +0200, cazic gmx net wrote:
> > [...]
> > >   Quick comments on it:
> > > > [...]
> > > > static int
> > > > xmlDOMWrapAdoptNode(void *ctxt, xmlDocPtr sourceDoc, xmlDocPtr
> destDoc,
> > > > 		xmlNodePtr node, xmlNodePtr parent, int unlink)
> > > > {
> > > 
> > >   sourceDoc is redundant, can be extracted from node->doc
> > 
> > OK. node->doc can be NULL if created with xmlNewNode.
> 
>   Hum, but if plugged in the doc it should have been updated, the
> underlying
> question is how permissive the function should be with respoect to the
> input
> document, I tend to agree with you it should try to cope even with weird
> input.

We should have no problems if node->doc is NULL, so true, @sourceDoc is
redundant.
 
> > >   parent should be optional NULL would be similar to the real DOM
> function
> > 
> > @parent is not needed; it's a left-over from xmlStaticCopyNode(), which
> I
> > used as the starting point.
> 
>   I would keep it (as optional) to allow better ns integration if
> possible.

Ah, now I seem to understand what you mean: with the originally unintended
@parent, acting as the insertion point in the destination document, you
assumed that a namespace reconciliation would be performed; right?
Yes, for non-DOM based calls, an on-the-fly recon. would spare a full
explicit recon. call.

So it could look like this:
If @node is given:
  - reconciliate namespaces which can be found in @parent's
    self-or-ancestor axis
  - if the branch of @parent has the document-node in the
    self-or-ancestor axis, then create missing ns declarations
    on the document-element, otherwise use "oldNs" of xmlDoc for
    temporary storage of missing declarations

With this we could have a non-redundant mechanism, since missing
declarations will be anchored on the document-element only.

If the user wants such an orphan branch (not connected to the document-node)
to be serialized, he still has the option to call a namespace
reconciliation, which would create the missing ns decls. on the brach-top.

> 
> > >   Error handling should be designed. A simple -1 error code back is
> not
> > > really
> > > suitable for the kind of complex operation that is being designed
> here.
> > 
> > OK.
> > 
> > > > 	case XML_DOCUMENT_NODE:
> > > >         case XML_HTML_DOCUMENT_NODE:
> > > 
> > >   XML_HTML_DOCUMENT_NODE and XML_DOCUMENT_NODE may not generate an
> > > error...
> > > I could think of a semantic for this, need to be checked against DOM.
> > 
> > Document nodes cannot be adopted as per DOM spec.
> 
>   okay, that clear :-)
> 
> > > >     sameDict = ((sourceDoc->dict == destDoc->dict) &&
> > > > 	(destDoc->dict != NULL)) ? 1 : 0;
> > > >     cur = node;
> > > 
> > >    if parent != NULL collect existing inscope namespaces
> > 
> > @parent will not be used.
> 
>   I would really keep it if possible.
> 
> > > > 		if (! sameDict) {
> > > 
> > > 
> > >    Wrong you need to check xmlDictOwns(sourceDoc->dict, cur->name)
> > > too or you are gonna leak cur->name if the node was added manually
> > 
> > So we always need a xmlDictOwns check? Can you see a constellation where
> we
> > can avoid this?
> 
>   not really, but it's a really fast call, usually it's a call with 2
> just 2 pointer comparison.

Good to hear that.

> > > > 		    if (destDoc->dict)
> > > > 			cur->name = xmlDictLookup(destDoc->dict, cur->name, -1);
> > > > 		    else if (sourceDoc->dict)
> > > > 			cur->name = BAD_CAST xmlStrdup(cur->name); 
> > > > 		    /*
> > > > 		    * TODO: Are namespace declarations ever in a dict?
> > > > 		    */
> > 
> > Are dicts ever used for namespace declarations?
> 
>   of course, SAX2 always pass namespaces related strings from a the
> document
> parser dictionary. It's the norm rather than the exception.

OK, should have known that, while working an streamed schemata with SAX2 ;-)

> > > > 		}
> > > > 		/*
> > > > 		* Adopt out-of-scope namespace declarations.
> > > > 		*/
> > > > 		if (cur->ns != NULL) {
> > > > 		    int i, j;
> > > 
> > >   I would rather use a hash table than comparing all namespaces string
> > 
> > Namespace strings are not compared. Only the pointers to xmlNs.
> 
>   But at some point you will need to compare strings, for example when
> updating that list. There is 2 aspects, speed and code maintainance. I
> can't tell about the speed but from a code cleanness POV it seems a more
> abstract based view using hashes would make the code less confusing. But
> it's all implementations details nothing critical.
> 
> > > > 		    } else {
> > > > 			/*
> > > > 			* User-defined behaviour.
> > > > 			*/
> > > 
> > >    you can't do that. ctxt need to be refined to be actually useful, a
> > > void * won't work. And adding 2 args might be just a bit too much,
> this
> > > need
> > > more thinking
> > 
> > Ctxt will not be void* in the end. We have to design a nice struct for
> it.
> 
>   okay :-)
> 
> > > 
> > >    I would really rather use a dictionnary for nsList it would be way
> > > cleaner.
> > > the only problem is that it would require a trick like a function
> > > recursion
> > > when encountering a namespace deactivation like xmlns="" or
> xmlns:foo=""
> > > or namespace redefinition to a diferent value but that quite
> unfrequent.
> > 
> > xmlns:foo="" is not an allowed namespace declaration as far as I know.
> 
>   yes in namespace-1.1

Jesus!

> > xmlns="" should not be referenced by an node->ns entry, since it is just
> a
> > machanism to disable the default namespace.
> 
>   yes but it's on the nsDef, that's the only way to implement this.
> 
> > Can you clarify why you want a hash here? The mechanism just assures
> that
> 
>   see before.
> 
> > references (node->ns) to the xmlNs entries will be valid, thus picked
> from
> > or created in "oldNs" if the original xmlNs entries are out-of-scope. It
> is
> > not a namespace reconciliation mechanism; it just unlinks the branch and
> > keeps namespace references alive; nsDef entries are not touched.
> > 
> > > > 	    case XML_ENTITY_REF_NODE:
> > > > 		/*
> > > > 		* TODO: Remove entity child nodes.
> > > > 		*/
> > > > 		goto internal_error;
> > > > 		break;
> > > 
> > >   forces a recursion see other examples of recursive tree walk with 
> > > entities references. Potentially a lookup of the entity being ref'ed
> > > >from the target document. XInclude has a semantic for such entities 
> > > remapping might use the same.
> > 
> > OK. The spec wants the referenced entity to be discarded; an entity of
> the
> > destination document will be assigned if available.
> 
>   if the spec has a clear behaviour for this then let's follow it :-)
> 
> > > 
> > > 
> > >   Hum, I seems to have missed handling XML_ELEMENT_NODE especially the
> > > part handling nsDef on those.
> > 
> > It's the first switch-case :-) Do we have to touch nsDef entires here?
> 
>   :-)
>   touch no, read yes, I probably missed it. Currently the list based code
> is a bit frightening, using dict might improve that aspect, if only in the
> design/review phase.
> 
> > > 
> > >   Obviously lot of thinking and testing need to be carried on. I would
> > > really
> > > like to get something we can finally rely on and not half of
> solutions.
> > 
> > Well, who doesn't? No offense: Daniel, if you can conjure a working
> solution
> > out of the box, then just do it. I'd love to see this issues being
> solved by
> 
>   Hum, there is a misunderstanding :-)
>   By half solutions I meant the couple of existing non-complete function
> which are already in tree.c and which I pointed to in earlier parts of
> this
> thread. I was not complaining about your attempt !

Ah, I put you back on the list of my friends then :-)

> > others, really. You could add the entity refernce stuff and fix the dict
> > parts, for I don't have the big picture in mind regarding the string
> dicts.
> > We'll surely need some thinking on the context struct, so it won't be
> > finished and maby should not be finished that fast.
> 
>   There is no hurry, I just want to end up with a complete solution this
> time
> even if it takes a month before everybody agree the resulting code is
> usable
> for them.
> 
> > >   Thanks a lot for starting the effort though there is obviously some
> work
> > > left :-)
> > > 
> > > Daniel
> > 
> > Proposal for an initial function head and return values:
> > 
> > /**
> >  * xmlDOMWrapAdoptNode:
> >  * @ctxt: the optional wrapper context
> >  * @destDoc: the adopting document
> >  * @node: the node to be adopted
> >  *
> >  * Unlinks and adopts a node.
> >  *
> >  * Returns: 0 in case of success
> >  *          1 if @node cannot be adopted
> >  *          -1 in case of an API or internal error.
> >  */
> > static int
> > xmlDOMWrapAdoptNode(void *xmlSomeStructNamePtr,
> >                     xmlDocPtr destDoc,
> >                     xmlNodePtr node)
> > 
> 
> Still open at the API level :
>   - following your comment sourceDoc might be added to cope with
>     node->doc == NULL, but this opens the door to more problem like
>     node->doc != NULL and node->doc != sourceDoc :-\

@sourceDoc should not be needed.

>   - I still think an optional parent node to be able to make namespace 
>     integration, it's incredible how much people have been complaining
>     for example in XSLT when cut and pasting trees and extra ns definition
>     were generated. Sometimes DTD are defined to allow the namespaces
>     only on the root element for example.

Automatic reconciliation as described above, agreed.

>   - more complex error handling. Potentially this could be options passed
>     to the function on how to process default edge cases.

An option arg would be good, even if we end up not using it.

Regards,

Kasimier



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]