Re: [xml] redicting parts of trees

From: Martijn Faassen <faassen infrae com>
To: Kasimier Buchcik <kbuchcik 4commerce de>
Cc: "xml gnome org" <xml gnome org>
Subject: Re: [xml] redicting parts of trees
Date: Thu, 19 May 2005 20:19:35 +0200

Kasimier Buchcik wrote:

Hi,

On Thu, 2005-05-19 at 17:16 +0200, Martijn Faassen wrote:
Kasimier Buchcik wrote:
[snip stuff that goes over his head without a lot of further study]
This is just a cheerleading note; I'm really glad you guys are takingthis up, as I can already see there are many subtle issues involved Iwould not have understood without significant study. Thanks!
Anyway, anything I can do now to help? I will of course be testing thisfacility at some stage within lxml, and give feedback then if necessary.
You could describe how you intend to manage namespaces in your
wrapper. Will you try to go W3C way or Libxml2 namespace way?

I'm following the ElementTree way, which uses Clarke notation. I.e. thewrapper shows namespace URIs directly as part of element names and such,like this:

{http://namespaces.somewhere.org/ns1}foo

and prefixes are, for now, completely ignored as not relevant to the XMLinfoset.

Both have pros and cons. The relevant drawback in Libxml2 way
is that it's hard, if even not possible, to implement a DOM wrapper
which uses a programming language, where the time of destruction
of an object lies not within the control of the programmer.

Thanks, this is interesting as this is exactly what I'm trying to dowith lxml.

Let me try to give some background information - possibly too
detailed. I hope to be corrected if something's wrong:

Libxml2 handles the corresponding DOM Node methods namespaceURI() and

prefix() in the following way:

node->ns->prefix == result of node.prefix()
node->ns->href   == result of node.namespaceURI()

The node->ns field is a pointer to an xmlNs struct, which
is held in the elem->nsDef field of element-nodes.

Right, I've been using this structure in the lxml implementation.

Such node->nsDef entries correspond to namespace declaration
attributes in DOM (e.g. xmlns:foo="urn:test:foo). Libxml2's
way demands a node->nsDef entry, thus a namespace declaration
attribute, on the node itself or on an ancestor node to be
present; which totally reflects the serialized (written as
XML file) form.

This circumstance creates the following problem:
If your remove a attribute-node, which is bound to a namespace,
from it's parent, the attr->ns field still points to an elem->nsDef
entry. This is OK, as long as this element-node is not itself
freed - which would free the elem->nsDef entries as well. The
destruction of this element would lead to attr->ns pointing to freed

memory.

Ugh. Luckily the ElementTree API doesn't allow the detaching ofattribute nodes from an element, but I can see how this would hurt anyW3C DOM implementation.

But now I wonder: does this only apply to attribute nodes, or also toelement nodes which are in a subtree? Testing this.. Ugh, yes, it does.When I move a namespaced element (where the namespace is defined higherin the tree) into another tree, and then subsequently remove theoriginal tree, things go way wrong and valgrind indeed points to areference to a libxml2 namespace structure that has since been removed.Not good...

But thanks for pointing this issue out to me!

There's no automatic mechanism to avoid this, since there is
no reference counting involved. In C this should be user
controllable: you just have to know what and when you are freeing
something. Not in other programming languages like Python, Delphi,
Java, etc. where the destruction time on objects is not always - if
ever - predictable.

Indeed. Python tends to be fairly predictable if its refcountingalgorithm is used, but that doesn't help any here, and that isn'tconstant across Python implementations anyway.

Safe removal of nodes:
So we obviously need a mechanism to let point the node->ns reference
to an xmlNs entry which is not in danger of being freed unpredictably.
A possible location would be an list of xmlNs entries, internally
managed by the DOM document wrapper.

Yes, in this case the problem would devolve to the issue I already have

with dictionaries, which is manageable as I can make this stuff globallyshared. Though, just as with dictionaries I hope that the adoptNode()functionality could take care of this as well.

I suspect that adoptNode() recreating namespaces wherever necessary inthe new document would indeed be sufficient to support Clarke notationin ElementTree, even though the XML serialization would look ugly.. Am Icorrect in that an adoptNode() would take care of this issue if prefixesare hidden from the API user's view?

Regards,

Martijn

Follow-Ups:
- Re: [xml] redicting parts of trees
  - From: Martijn Faassen
- Re: [xml] redicting parts of trees
  - From: Kasimier Buchcik

References:
- Re: [xml] redicting parts of trees
  - From: Daniel Veillard
- Re: [xml] redicting parts of trees
  - From: cazic
- Re: [xml] redicting parts of trees
  - From: Kasimier Buchcik
- Re: [xml] redicting parts of trees
  - From: Daniel Veillard
- Re: [xml] redicting parts of trees
  - From: Kasimier Buchcik
- Re: [xml] redicting parts of trees
  - From: Martijn Faassen
- Re: [xml] redicting parts of trees
  - From: Kasimier Buchcik

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]