Re: [xml] faster replacement libxml.py for python 2.2

From: Hannu Krosing <hannu tm ee>
To: veillard redhat com
Cc: xml gnome org
Subject: Re: [xml] faster replacement libxml.py for python 2.2
Date: 05 Jan 2003 06:38:04 +0500

Daniel Veillard kirjutas L, 04.01.2003 kell 18:12:

On Sat, Jan 04, 2003 at 05:08:06PM +0500, Hannu Krosing wrote:


I have modified libxml.py (the handmade part of libxml2.py) to make 
use of some new features of python 2.2 , namely

1) replaced __gettattr__() with property()

2) added an __iter__() method that returns an iterator over subtree of
node


  is 2) dependant on 2.x features ?


Yes. iterators (as well as generators and wekrefs) are both new (but
very useful) features.

The result of replacing __gettattr__() with property() is a 2x speedup
when iterating over tree using node.next and node.children


  Well, well ... the problem is that the libxml2 module so far was completely
independant of the version of Python used, i.e. it would compile and
run as-is with Python 1.5 which is still deployed on a lot of machines ...
  There is 2 ways to integrate your speedup:
    - either try to make the code generic, i.e. workable on 1.5
      setups
    - or generate different .py modules for the new and the old
      version (will your code work with 2.1 ? where is the limit
      precisely ?)

  the former would be the best, the later is acceptable too the install
phase might need some tweaking but that's reasonnable.


I was planning along the lines of 2) as this will bring the best
performance and also more freedom in the hand-made part.

An additional 12% speedup can be gained by not having a separate xmlCore
class (it is used only by xmlNode) but including the hand-made part
directly in definition of class xmlNode.


  That would make teh maintainance harder. The generation is automated,
and I would like to keep it that way really.

with current libxml with added __iter_ it produces:

nodecount(ot)

nodes: 74957
time : 27.16
nodes/sec 2760.39


with my modifications (inc removing xmlCore) the result is

nodecount(ot)

nodes: 74957
time : 12.87
nodes/sec 5824.71


  Okay that look significant. but might be a side effect of the 
way your iterator is implemented, a C coded build of the node list
might make that vanish, right ?


Probably yes. The iterator was just used as a test here to show that
attribute access using the new feature is 2x faster than using
__getattr__. If I had implemented the iterator using the get_xxx calls
there would have been no difference (except the __doc__ strings on
attributes ;)

I guess an order-of-magnitude speedup could be gained by moving the
whole libxml2.py to a C module. Are there any plans of doing so ?


  well there are a number of factors:
    - having the .py code with comments helps learning about the module


having doc-strings in right places could do the same ;)

And I think that while modifying generator to make all-C xmlNode et.al.
it should still make the all-python classes for those having problems
and also for testing.

    - can exactly the same API be provided by pure C code, I assume so
      but I have no idea on how to do it


It seems that with type/class unification in python 2.x it should be
possible.

I've planned to do something incremental that retains xmlNode._o as
PyCObject so that most (all ? :) non-C extension code will run.

This could also take care of free-ing the underlying c structures, even
though making it robust could need weak references (another python2.x
feature) to make it robust.

    - what would be the real speedup ?


Only testing will show. My gut feeling tells me that most of the time in
traversing is taken up in __init__'ing and __del__'ing of nodes.

    - how long would it take, the libxml2 module is in a very large part
      automatically generated, so generating C code need a change in the
      generator.


That'll be probably the fastest way to do it anyway ...

    - I really want to keep the generation automatic. I don't have time
      to change things in 2 places at a time, and I like the reliability
      of the generated code.


Sure.

  In a nutshell, it looks like a large effort, and I don't have the time
for this ATM. You can have a look at it, maybe it's easy, maybe not, but
I can't start the work myself, if you provide example code of converted classes
maybe adapting the generator could be worked on quickly, but I have no
guarantee at this point.


I'll try to make sample classes and/or tweaked generator for evaluation.

-- 
Hannu Krosing <hannu tm ee>

References:
- [xml] faster replacement libxml.py for python 2.2
  - From: Hannu Krosing
- Re: [xml] faster replacement libxml.py for python 2.2
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]