[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] faster replacement libxml.py for python 2.2



Daniel Veillard kirjutas L, 04.01.2003 kell 18:12:
> On Sat, Jan 04, 2003 at 05:08:06PM +0500, Hannu Krosing wrote:
> > 
> > I have modified libxml.py (the handmade part of libxml2.py) to make 
> > use of some new features of python 2.2 , namely
> > 
> > 1) replaced __gettattr__() with property()
> > 
> > 2) added an __iter__() method that returns an iterator over subtree of
> > node
> 
>   is 2) dependant on 2.x features ?

Yes. iterators (as well as generators and wekrefs) are both new (but
very useful) features.

> 
> > The result of replacing __gettattr__() with property() is a 2x speedup
> > when iterating over tree using node.next and node.children
> 
>   Well, well ... the problem is that the libxml2 module so far was completely
> independant of the version of Python used, i.e. it would compile and
> run as-is with Python 1.5 which is still deployed on a lot of machines ...
>   There is 2 ways to integrate your speedup:
>     - either try to make the code generic, i.e. workable on 1.5
>       setups
>     - or generate different .py modules for the new and the old
>       version (will your code work with 2.1 ? where is the limit
>       precisely ?)
> 
>   the former would be the best, the later is acceptable too the install
> phase might need some tweaking but that's reasonnable.

I was planning along the lines of 2) as this will bring the best
performance and also more freedom in the hand-made part.

> > An additional 12% speedup can be gained by not having a separate xmlCore
> > class (it is used only by xmlNode) but including the hand-made part
> > directly in definition of class xmlNode.
> 
>   That would make teh maintainance harder. The generation is automated,
> and I would like to keep it that way really.
> 
> > with current libxml with added __iter_ it produces:
> > 
> > >>> nodecount(ot)
> > nodes: 74957
> > time : 27.16
> > nodes/sec 2760.39
> > 
> > 
> > with my modifications (inc removing xmlCore) the result is
> > 
> > >>> nodecount(ot)
> > nodes: 74957
> > time : 12.87
> > nodes/sec 5824.71
> 
>   Okay that look significant. but might be a side effect of the 
> way your iterator is implemented, a C coded build of the node list
> might make that vanish, right ?

Probably yes. The iterator was just used as a test here to show that
attribute access using the new feature is 2x faster than using
__getattr__. If I had implemented the iterator using the get_xxx calls
there would have been no difference (except the __doc__ strings on
attributes ;)

> > I guess an order-of-magnitude speedup could be gained by moving the
> > whole libxml2.py to a C module. Are there any plans of doing so ?
> 
>   well there are a number of factors:
>     - having the .py code with comments helps learning about the module

having doc-strings in right places could do the same ;)

And I think that while modifying generator to make all-C xmlNode et.al.
it should still make the all-python classes for those having problems
and also for testing.

>     - can exactly the same API be provided by pure C code, I assume so
>       but I have no idea on how to do it

It seems that with type/class unification in python 2.x it should be
possible.

I've planned to do something incremental that retains xmlNode._o as
PyCObject so that most (all ? :) non-C extension code will run.

This could also take care of free-ing the underlying c structures, even
though making it robust could need weak references (another python2.x
feature) to make it robust.

>     - what would be the real speedup ?

Only testing will show. My gut feeling tells me that most of the time in
traversing is taken up in __init__'ing and __del__'ing of nodes.

>     - how long would it take, the libxml2 module is in a very large part
>       automatically generated, so generating C code need a change in the
>       generator.

That'll be probably the fastest way to do it anyway ...

>     - I really want to keep the generation automatic. I don't have time
>       to change things in 2 places at a time, and I like the reliability
>       of the generated code.

Sure.

>   In a nutshell, it looks like a large effort, and I don't have the time
> for this ATM. You can have a look at it, maybe it's easy, maybe not, but
> I can't start the work myself, if you provide example code of converted classes
> maybe adapting the generator could be worked on quickly, but I have no
> guarantee at this point.

I'll try to make sample classes and/or tweaked generator for evaluation.

-- 
Hannu Krosing <hannu tm ee>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]