Re: [xml] redicting parts of trees
- From: Martijn Faassen <faassen infrae com>
- To: Kasimier Buchcik <kbuchcik 4commerce de>
- Cc: "xml gnome org" <xml gnome org>
- Subject: Re: [xml] redicting parts of trees
- Date: Thu, 19 May 2005 20:19:35 +0200
Kasimier Buchcik wrote:
Hi,
On Thu, 2005-05-19 at 17:16 +0200, Martijn Faassen wrote:
Kasimier Buchcik wrote:
[snip stuff that goes over his head without a lot of further study]
This is just a cheerleading note; I'm really glad you guys are taking
this up, as I can already see there are many subtle issues involved I
would not have understood without significant study. Thanks!
Anyway, anything I can do now to help? I will of course be testing this
facility at some stage within lxml, and give feedback then if necessary.
You could describe how you intend to manage namespaces in your
wrapper. Will you try to go W3C way or Libxml2 namespace way?
I'm following the ElementTree way, which uses Clarke notation. I.e. the
wrapper shows namespace URIs directly as part of element names and such,
like this:
{http://namespaces.somewhere.org/ns1}foo
and prefixes are, for now, completely ignored as not relevant to the XML
infoset.
Both have pros and cons. The relevant drawback in Libxml2 way
is that it's hard, if even not possible, to implement a DOM wrapper
which uses a programming language, where the time of destruction
of an object lies not within the control of the programmer.
Thanks, this is interesting as this is exactly what I'm trying to do
with lxml.
Let me try to give some background information - possibly too
detailed. I hope to be corrected if something's wrong:
Libxml2 handles the corresponding DOM Node methods namespaceURI() and
prefix() in the following way:
node->ns->prefix == result of node.prefix()
node->ns->href == result of node.namespaceURI()
The node->ns field is a pointer to an xmlNs struct, which
is held in the elem->nsDef field of element-nodes.
Right, I've been using this structure in the lxml implementation.
Such node->nsDef entries correspond to namespace declaration
attributes in DOM (e.g. xmlns:foo="urn:test:foo). Libxml2's
way demands a node->nsDef entry, thus a namespace declaration
attribute, on the node itself or on an ancestor node to be
present; which totally reflects the serialized (written as
XML file) form.
This circumstance creates the following problem:
If your remove a attribute-node, which is bound to a namespace,
from it's parent, the attr->ns field still points to an elem->nsDef
entry. This is OK, as long as this element-node is not itself
freed - which would free the elem->nsDef entries as well. The
destruction of this element would lead to attr->ns pointing to freed
memory.
Ugh. Luckily the ElementTree API doesn't allow the detaching of
attribute nodes from an element, but I can see how this would hurt any
W3C DOM implementation.
But now I wonder: does this only apply to attribute nodes, or also to
element nodes which are in a subtree? Testing this.. Ugh, yes, it does.
When I move a namespaced element (where the namespace is defined higher
in the tree) into another tree, and then subsequently remove the
original tree, things go way wrong and valgrind indeed points to a
reference to a libxml2 namespace structure that has since been removed.
Not good...
But thanks for pointing this issue out to me!
There's no automatic mechanism to avoid this, since there is
no reference counting involved. In C this should be user
controllable: you just have to know what and when you are freeing
something. Not in other programming languages like Python, Delphi,
Java, etc. where the destruction time on objects is not always - if
ever - predictable.
Indeed. Python tends to be fairly predictable if its refcounting
algorithm is used, but that doesn't help any here, and that isn't
constant across Python implementations anyway.
Safe removal of nodes:
So we obviously need a mechanism to let point the node->ns reference
to an xmlNs entry which is not in danger of being freed unpredictably.
A possible location would be an list of xmlNs entries, internally
managed by the DOM document wrapper.
Yes, in this case the problem would devolve to the issue I already have
with dictionaries, which is manageable as I can make this stuff globally
shared. Though, just as with dictionaries I hope that the adoptNode()
functionality could take care of this as well.
I suspect that adoptNode() recreating namespaces wherever necessary in
the new document would indeed be sufficient to support Clarke notation
in ElementTree, even though the XML serialization would look ugly.. Am I
correct in that an adoptNode() would take care of this issue if prefixes
are hidden from the API user's view?
Regards,
Martijn
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]