[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] redicting parts of trees
- From: Daniel Veillard <veillard redhat com>
- To: Martijn Faassen <faassen infrae com>
- Cc: xml gnome org
- Subject: Re: [xml] redicting parts of trees
- Date: Sun, 15 May 2005 06:45:10 -0400
On Sun, May 15, 2005 at 12:37:38AM +0200, Martijn Faassen wrote:
> >>Would the developers be open to me suggesting changes to the XSLT
> >>codebase to make this possible again? I suppose I should ask on the
> >>XSLT
> >
> >
> >Yes that should be doable. I'm not sure what would be the best API
> >for this.
>
> I'm not either, but I'll think about this. Would sharing a dictionary
> break the read-only guarantee though, and thus break multi-threading?
I'm afraid yes. Basically if node are generated then the dictionnary
associated to the transformation will grow. If multiple transformations
runs in parallel using the same dictionnary then you have concurrent
unsynchronized accesses to the dictionnary. At least it was the line of
thought when I created that subdict thing. I note that in the meantime
we added a mutex to the dictionnaries so this should no longer be
a problem. So in a nutshell I think it will break the read-only assumption
but it won't break multithreading, so this should be doable.
> >>list, so let's move on to the real purpose of this mail.
> >>
> >>Exploring these issues made me conclude that it's time to at least
> >>look at the alternative to sharing a single global dictionary,
> >>redicting parts of trees. A redicting operation would take place
> >>whenever a node is moved into a new tree. All the strings in the
> >>subtree below this node will be traced to the originating
> >>document's dictionary, and the entries will be copied into the
> >>target document's dictionary. Additionally, all string references
> >>in the subtree will be made to point to the new document's
> >>dictionary.
> >
> >Yes, it seems that at the DOM level this operation is called an
> >import based on some PHP/javascript examples I saw recently.
>
> Yes, the W3C DOM indeed defines an importNode operation, and I guess I'm
> asking for the equivalent here. :)
>
> >I think that if we add this then we should try to match the existing
> >semantic of those operation in PHP for example.
>
> Does PHP implement this operation on top of libxml2? We might also want
yes. PHP5 is on top of libxml2. Hard to tell how reusable the code would be
as an example without looking at it. They may have some intermediate layer.
> to consider the W3C importNode semantic, though I doubt it actually says
> much of use for us here...
Well just trying to follow the principle of least surprise.
> >The thing which need to be checked when preparing for such an import
> >are: - doc remapping
>
> By this you mean telling all nodes about the new document node, right?
yes
> >- dictionnary remapping - namespace references to the original
> >document
>
> As a document contains a list of all the namespace references, right? So
yes but not centralized. You have to walk the subtree and the ancestors
to build a full picture.
> if the original document were to be destroyed, namespace references to
> it from nodes now in new documents would be pointed to free space.
to freed data, yes.
> >- namespace remapping to the local document
>
> What does this mean as compared to the previous, namespace references to
> the original document?
instead of recreating all the namespace declaration used in the subtree
at the insertion point, then reuse the declaration already in scope at that
insertion point. Example if {"dbk", "http://docbook.org"} is in scope at the
insertion point due to an ancestor holding the declaration, then if this
is among the namespace bindings in use by the subtree, do not redeclare
it at the insertion point. Apparently a difference between PHP5 and javascript
implementation of import seems to be that javascript one would reuse the
namespace declaration if defined with a different prefix (hence changing the
prefixes in the pruned subtree), I don't know which smeantic is DOM's one :-)
> >- entities reference to the original document I think those are the
> >only pointers which are added to the pure tree oriented
> >parent/child/sibling ones.
>
> Thanks for the list!
no problem.
> >Looking at the import implementation of PHP5 might give us an idea of
> >how to implement this. Note that there are incomplete APIs dealing
> >either just with document pointers (xmlSetTreeDoc) or just namespaces
> >(xmlNewReconciliedNs and xmlReconciliateNs).
>
> Okay, I shall study the implementations of those. It would probably be
> more efficient to provide a function that did all the remapping in one
> operation as it traversed the tree, though.
yes that's what I have in mind. It should be doable in a single pass
first walking all the ancestors of the insertion node to collect existing
namespaces, then a scan in document order of the subtree being moved.
I'm not sure it's fun though, xmlReconciliateNs() does some of this.
> [snip]
>
> >>In order to write a good redicting operation, I'd need a bit more
> >>information about which information in a tree exactly can end up in
> >>a dictionary. If someone would be able to give me a list of what
> >>ends up in the dictionary, that would be extremely helpful.
> >
> >All markup names, all namespaces strings (prefix and namespace names)
> > and some text node content (so that all "formatting nodes" used to
> >indent share as much as possible, or very short text nodes for
> >example "0" or "1").
>
> Short text nodes includes attribute values?
yes, an attribute has a children list, which is usually a single text node
but sometimes also include entity references and text node intermixed.
> >P.S.: I think I should be able to design a method to make importing
> >strings from a given dictionnary into python strings quite faster for
> >repeatedly querying the same set of strings. The principle would be
> >to add an API to the dictionnary returning an index for the string
> >(cost O(1)) and at the python binding level have an array keeping
> >pointers to the strings already converted (Py_INCREF'ed of course).
>
> That would be very nice to have! I played with this idea before myself,
> but didn't get anything working yet. I will think about this some more.
> Is there any userdata facility in dictionaries already?
no there isn't. They are an opaque structure too. Adding a _private would
require accessor functions to be added. That's doable.
Daniel
--
Daniel Veillard | Red Hat Desktop team http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]