[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] redicting parts of trees
- From: cazic gmx net
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] redicting parts of trees
- Date: Mon, 16 May 2005 18:06:05 +0200 (MEST)
Hi,
> Von: Daniel Veillard <veillard redhat com>
> Datum: Mon, 16 May 2005 10:33:06 -0400
>
> On Mon, May 16, 2005 at 03:47:20PM +0200, cazic gmx net wrote:
> > Hi,
> >
> > Obviously it's not enough to say that it is a sketch when Daniel is
> around
> > ;-)
>
> don't get me wrong, I just wanted to provide some comments :-)
:-)
> > > Von: Daniel Veillard <veillard redhat com>
> > > Datum: Mon, 16 May 2005 08:28:27 -0400
> > >
> > > On Mon, May 16, 2005 at 02:04:44PM +0200, cazic gmx net wrote:
> > [...]
> > > Quick comments on it:
> > > > [...]
> > > > static int
> > > > xmlDOMWrapAdoptNode(void *ctxt, xmlDocPtr sourceDoc, xmlDocPtr
> destDoc,
> > > > xmlNodePtr node, xmlNodePtr parent, int unlink)
> > > > {
> > >
> > > sourceDoc is redundant, can be extracted from node->doc
> >
> > OK. node->doc can be NULL if created with xmlNewNode.
>
> Hum, but if plugged in the doc it should have been updated, the
> underlying
> question is how permissive the function should be with respoect to the
> input
> document, I tend to agree with you it should try to cope even with weird
> input.
We should have no problems if node->doc is NULL, so true, @sourceDoc is
redundant.
> > > parent should be optional NULL would be similar to the real DOM
> function
> >
> > @parent is not needed; it's a left-over from xmlStaticCopyNode(), which
> I
> > used as the starting point.
>
> I would keep it (as optional) to allow better ns integration if
> possible.
Ah, now I seem to understand what you mean: with the originally unintended
@parent, acting as the insertion point in the destination document, you
assumed that a namespace reconciliation would be performed; right?
Yes, for non-DOM based calls, an on-the-fly recon. would spare a full
explicit recon. call.
So it could look like this:
If @node is given:
- reconciliate namespaces which can be found in @parent's
self-or-ancestor axis
- if the branch of @parent has the document-node in the
self-or-ancestor axis, then create missing ns declarations
on the document-element, otherwise use "oldNs" of xmlDoc for
temporary storage of missing declarations
With this we could have a non-redundant mechanism, since missing
declarations will be anchored on the document-element only.
If the user wants such an orphan branch (not connected to the document-node)
to be serialized, he still has the option to call a namespace
reconciliation, which would create the missing ns decls. on the brach-top.
>
> > > Error handling should be designed. A simple -1 error code back is
> not
> > > really
> > > suitable for the kind of complex operation that is being designed
> here.
> >
> > OK.
> >
> > > > case XML_DOCUMENT_NODE:
> > > > case XML_HTML_DOCUMENT_NODE:
> > >
> > > XML_HTML_DOCUMENT_NODE and XML_DOCUMENT_NODE may not generate an
> > > error...
> > > I could think of a semantic for this, need to be checked against DOM.
> >
> > Document nodes cannot be adopted as per DOM spec.
>
> okay, that clear :-)
>
> > > > sameDict = ((sourceDoc->dict == destDoc->dict) &&
> > > > (destDoc->dict != NULL)) ? 1 : 0;
> > > > cur = node;
> > >
> > > if parent != NULL collect existing inscope namespaces
> >
> > @parent will not be used.
>
> I would really keep it if possible.
>
> > > > if (! sameDict) {
> > >
> > >
> > > Wrong you need to check xmlDictOwns(sourceDoc->dict, cur->name)
> > > too or you are gonna leak cur->name if the node was added manually
> >
> > So we always need a xmlDictOwns check? Can you see a constellation where
> we
> > can avoid this?
>
> not really, but it's a really fast call, usually it's a call with 2
> just 2 pointer comparison.
Good to hear that.
> > > > if (destDoc->dict)
> > > > cur->name = xmlDictLookup(destDoc->dict, cur->name, -1);
> > > > else if (sourceDoc->dict)
> > > > cur->name = BAD_CAST xmlStrdup(cur->name);
> > > > /*
> > > > * TODO: Are namespace declarations ever in a dict?
> > > > */
> >
> > Are dicts ever used for namespace declarations?
>
> of course, SAX2 always pass namespaces related strings from a the
> document
> parser dictionary. It's the norm rather than the exception.
OK, should have known that, while working an streamed schemata with SAX2 ;-)
> > > > }
> > > > /*
> > > > * Adopt out-of-scope namespace declarations.
> > > > */
> > > > if (cur->ns != NULL) {
> > > > int i, j;
> > >
> > > I would rather use a hash table than comparing all namespaces string
> >
> > Namespace strings are not compared. Only the pointers to xmlNs.
>
> But at some point you will need to compare strings, for example when
> updating that list. There is 2 aspects, speed and code maintainance. I
> can't tell about the speed but from a code cleanness POV it seems a more
> abstract based view using hashes would make the code less confusing. But
> it's all implementations details nothing critical.
>
> > > > } else {
> > > > /*
> > > > * User-defined behaviour.
> > > > */
> > >
> > > you can't do that. ctxt need to be refined to be actually useful, a
> > > void * won't work. And adding 2 args might be just a bit too much,
> this
> > > need
> > > more thinking
> >
> > Ctxt will not be void* in the end. We have to design a nice struct for
> it.
>
> okay :-)
>
> > >
> > > I would really rather use a dictionnary for nsList it would be way
> > > cleaner.
> > > the only problem is that it would require a trick like a function
> > > recursion
> > > when encountering a namespace deactivation like xmlns="" or
> xmlns:foo=""
> > > or namespace redefinition to a diferent value but that quite
> unfrequent.
> >
> > xmlns:foo="" is not an allowed namespace declaration as far as I know.
>
> yes in namespace-1.1
Jesus!
> > xmlns="" should not be referenced by an node->ns entry, since it is just
> a
> > machanism to disable the default namespace.
>
> yes but it's on the nsDef, that's the only way to implement this.
>
> > Can you clarify why you want a hash here? The mechanism just assures
> that
>
> see before.
>
> > references (node->ns) to the xmlNs entries will be valid, thus picked
> from
> > or created in "oldNs" if the original xmlNs entries are out-of-scope. It
> is
> > not a namespace reconciliation mechanism; it just unlinks the branch and
> > keeps namespace references alive; nsDef entries are not touched.
> >
> > > > case XML_ENTITY_REF_NODE:
> > > > /*
> > > > * TODO: Remove entity child nodes.
> > > > */
> > > > goto internal_error;
> > > > break;
> > >
> > > forces a recursion see other examples of recursive tree walk with
> > > entities references. Potentially a lookup of the entity being ref'ed
> > > >from the target document. XInclude has a semantic for such entities
> > > remapping might use the same.
> >
> > OK. The spec wants the referenced entity to be discarded; an entity of
> the
> > destination document will be assigned if available.
>
> if the spec has a clear behaviour for this then let's follow it :-)
>
> > >
> > >
> > > Hum, I seems to have missed handling XML_ELEMENT_NODE especially the
> > > part handling nsDef on those.
> >
> > It's the first switch-case :-) Do we have to touch nsDef entires here?
>
> :-)
> touch no, read yes, I probably missed it. Currently the list based code
> is a bit frightening, using dict might improve that aspect, if only in the
> design/review phase.
>
> > >
> > > Obviously lot of thinking and testing need to be carried on. I would
> > > really
> > > like to get something we can finally rely on and not half of
> solutions.
> >
> > Well, who doesn't? No offense: Daniel, if you can conjure a working
> solution
> > out of the box, then just do it. I'd love to see this issues being
> solved by
>
> Hum, there is a misunderstanding :-)
> By half solutions I meant the couple of existing non-complete function
> which are already in tree.c and which I pointed to in earlier parts of
> this
> thread. I was not complaining about your attempt !
Ah, I put you back on the list of my friends then :-)
> > others, really. You could add the entity refernce stuff and fix the dict
> > parts, for I don't have the big picture in mind regarding the string
> dicts.
> > We'll surely need some thinking on the context struct, so it won't be
> > finished and maby should not be finished that fast.
>
> There is no hurry, I just want to end up with a complete solution this
> time
> even if it takes a month before everybody agree the resulting code is
> usable
> for them.
>
> > > Thanks a lot for starting the effort though there is obviously some
> work
> > > left :-)
> > >
> > > Daniel
> >
> > Proposal for an initial function head and return values:
> >
> > /**
> > * xmlDOMWrapAdoptNode:
> > * @ctxt: the optional wrapper context
> > * @destDoc: the adopting document
> > * @node: the node to be adopted
> > *
> > * Unlinks and adopts a node.
> > *
> > * Returns: 0 in case of success
> > * 1 if @node cannot be adopted
> > * -1 in case of an API or internal error.
> > */
> > static int
> > xmlDOMWrapAdoptNode(void *xmlSomeStructNamePtr,
> > xmlDocPtr destDoc,
> > xmlNodePtr node)
> >
>
> Still open at the API level :
> - following your comment sourceDoc might be added to cope with
> node->doc == NULL, but this opens the door to more problem like
> node->doc != NULL and node->doc != sourceDoc :-\
@sourceDoc should not be needed.
> - I still think an optional parent node to be able to make namespace
> integration, it's incredible how much people have been complaining
> for example in XSLT when cut and pasting trees and extra ns definition
> were generated. Sometimes DTD are defined to allow the namespaces
> only on the root element for example.
Automatic reconciliation as described above, agreed.
> - more complex error handling. Potentially this could be options passed
> to the function on how to process default edge cases.
An option arg would be good, even if we end up not using it.
Regards,
Kasimier
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]