Re: [xml] redicting parts of trees

From: Martijn Faassen <faassen infrae com>
To: Kasimier Buchcik <kbuchcik 4commerce de>
Cc: "xml gnome org" <xml gnome org>
Subject: Re: [xml] redicting parts of trees
Date: Fri, 20 May 2005 23:28:04 +0200

Kasimier Buchcik wrote:

Hi,

On Fri, 2005-05-20 at 16:59 +0200, Martijn Faassen wrote:

Kasimier Buchcik wrote:



[...]

Yeah, I read some of the message on your lxml list about your mechanism
to keep detached nodes alive if they are referenced by multiple wrapper
proxies. We took a sometimes memory-consuming but simple approach: we
never free any removed Libxml2 nodes from the document, they are moved
into an internal list of "garbage" nodes in the document wrapper and
freed when the document is freed. A "flush" method can be used to
cleanup such "garbage" nodes, if the user is sure that it's safe.
Right, since lxml aims at being as "Pythonic" as possible I don't wantthe user to worry about these issues at all. I think I've accomplishedthis fairly well, though I'm still mopping up bugs here and there onceevery while (plus some fundamental stuff I hope to solve for good whenwe have an adoptNode()) and I'm sure some performance issues could beimproved somewhat still.


OK. We wanted to be it as quick as possible. The "flushGarbage" is
normally not called, so only after massive removals of nodes. Do you
handle XPath results as well? I.e. is the reference counter (if that's
what you do) increased for every node in such a list as well?

There's in fact no reference counter in the lxml code itself; if a nodehas a proxy in Python, the same proxy is reused always. The proxy *is*reference counted by Python. :) When the proxy is deallocated, Itherefore know nothing points directly at that node. At this point thatnode is a candidate for removal.

At that point I do not know yet whether there isn't something pointingat a node *below* this node or at a node above this node in the sametree. So, whenever a proxy is deallocated, the following checks areperformed:

* tree walk upwards to see whether the node is still in a document orhas nodes with proxies pointing to it below. If so, don't deallocate thetree.

* if tree is not in doc, tree walk throughout the tree bto see whetherany proxy exists that still points to it. If so, don't deallocate.

In addition, whole documents are removed if no more proxies exist thatrefer to this document. This takes advantage of Python refcountingagain; the proxies have a reference to the document proxy and theunderlying document gets deallocated only if no more proxy exists forthe whole tree.

The worst case behavior of this algorithm is that a whole tree walk isrequired. I believe however that this is relatively rare, as most of thetime the algorithm can stop when the document node is reached (mostnodes after all are in a tree).

[...]

Yes, in your case, if single attributes are not expected to be adopted,
and potentially many auto-created namespace declarations don't bother
you, the mechanism of xmlReconciliateNs seems best fitting: it just
re-creates the missing declarations on the adopted element. OK, good to
know that!
Yes, indeed. I am a bit concerned the namespace declarations will bepolluted somewhat when serializing, but I can live with that for now, aslong as the infoset is still okay.


I'll try to store the ns-declarations on the doc, as I have Rob on this
side now. So we'll get less redundant ns-declarations.


That'd be good!

Regards,

Martijn

References:
- Re: [xml] redicting parts of trees
  - From: Daniel Veillard
- Re: [xml] redicting parts of trees
  - From: cazic
- Re: [xml] redicting parts of trees
  - From: Kasimier Buchcik
- Re: [xml] redicting parts of trees
  - From: Daniel Veillard
- Re: [xml] redicting parts of trees
  - From: Kasimier Buchcik
- Re: [xml] redicting parts of trees
  - From: Martijn Faassen
- Re: [xml] redicting parts of trees
  - From: Kasimier Buchcik
- Re: [xml] redicting parts of trees
  - From: Martijn Faassen
- Re: [xml] redicting parts of trees
  - From: Kasimier Buchcik
- Re: [xml] redicting parts of trees
  - From: Martijn Faassen
- Re: [xml] redicting parts of trees
  - From: Kasimier Buchcik

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]