Re: [xml] redicting parts of trees



Daniel Veillard wrote:
On Fri, May 13, 2005 at 11:15:08PM +0200, Martijn Faassen wrote:

[snip]
Up to there I agree, but I don't understand "style's signature". If
you meant the "style's dictionnary", then yes.

Oops, sorry; I was tired when I wrote that and indeed meant
'dictionary', not 'signature'.

[snip]

The reason is that libxslt makes the garantee that a compiled
stylesheet is read-only when used to make a transformation. It also
avoid problems of shared resources in multi-threaded XSLT engines.


Would the developers be open to me suggesting changes to the XSLT codebase to make this possible again? I suppose I should ask on the
XSLT


Yes that should be doable. I'm not sure what would be the best API
for this.

I'm not either, but I'll think about this. Would sharing a dictionary
break the read-only guarantee though, and thus break multi-threading?

list, so let's move on to the real purpose of this mail.

Exploring these issues made me conclude that it's time to at least
look at the alternative to sharing a single global dictionary,
redicting parts of trees. A redicting operation would take place
whenever a node is moved into a new tree. All the strings in the
subtree below this node will be traced to the originating
document's dictionary, and the entries will be copied into the
target document's dictionary. Additionally, all string references
in the subtree will be made to point to the new document's
dictionary.

Yes, it seems that at the DOM level this operation is called an
import based on some PHP/javascript examples I saw recently.

Yes, the W3C DOM indeed defines an importNode operation, and I guess I'm
asking for the equivalent here. :)

I think that if we add this then we should try to match the existing
semantic of those operation in PHP for example.

Does PHP implement this operation on top of libxml2? We might also want
to consider the W3C importNode semantic, though I doubt it actually says
much of use for us here...

The thing which need to be checked when preparing for such an import
are: - doc remapping

By this you mean telling all nodes about the new document node, right?

- dictionnary remapping - namespace references to the original
document

As a document contains a list of all the namespace references, right? So
if the original document were to be destroyed, namespace references to
it from nodes now in new documents would be pointed to free space.

- namespace remapping to the local document

What does this mean as compared to the previous, namespace references to
the original document?

- entities reference to the original document I think those are the
only pointers which are added to the pure tree oriented
parent/child/sibling ones.

Thanks for the list!

Looking at the import implementation of PHP5 might give us an idea of
how to implement this. Note that there are incomplete APIs dealing
either just with document pointers (xmlSetTreeDoc) or just namespaces
(xmlNewReconciliedNs and xmlReconciliateNs).

Okay, I shall study the implementations of those. It would probably be
more efficient to  provide a function that did all the remapping in one
operation as it traversed the tree, though.

[snip]

In order to write a good redicting operation, I'd need a bit more information about which information in a tree exactly can end up in
a dictionary. If someone would be able to give me a list of what
ends up in the dictionary, that would be extremely helpful.
All markup names, all namespaces strings (prefix and namespace names)
 and some text node content (so that all "formatting nodes" used to
indent share as much as possible, or very short text nodes for
example "0" or "1").

Short text nodes includes attribute values?

Hope it helps, and thanks !

Thank you for the very helpful answer. I will also look at the PHP5 implementation.

Daniel

P.S.: I think I should be able to design a method to make importing
strings from a given dictionnary into python strings quite faster for
repeatedly querying the same set of strings. The principle would be
to add an API to the dictionnary returning an index for the string
(cost O(1)) and at the python binding level have an array keeping
pointers to the strings already converted (Py_INCREF'ed of course).

That would be very nice to have! I played with this idea before myself, but didn't get anything working yet. I will think about this some more. Is there any userdata facility in dictionaries already?

Regards,

Martijn




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]