[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] faster replacement libxml.py for python 2.2
- From: Hannu Krosing <hannu tm ee>
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] faster replacement libxml.py for python 2.2
- Date: 05 Jan 2003 06:38:04 +0500
Daniel Veillard kirjutas L, 04.01.2003 kell 18:12:
> On Sat, Jan 04, 2003 at 05:08:06PM +0500, Hannu Krosing wrote:
> >
> > I have modified libxml.py (the handmade part of libxml2.py) to make
> > use of some new features of python 2.2 , namely
> >
> > 1) replaced __gettattr__() with property()
> >
> > 2) added an __iter__() method that returns an iterator over subtree of
> > node
>
> is 2) dependant on 2.x features ?
Yes. iterators (as well as generators and wekrefs) are both new (but
very useful) features.
>
> > The result of replacing __gettattr__() with property() is a 2x speedup
> > when iterating over tree using node.next and node.children
>
> Well, well ... the problem is that the libxml2 module so far was completely
> independant of the version of Python used, i.e. it would compile and
> run as-is with Python 1.5 which is still deployed on a lot of machines ...
> There is 2 ways to integrate your speedup:
> - either try to make the code generic, i.e. workable on 1.5
> setups
> - or generate different .py modules for the new and the old
> version (will your code work with 2.1 ? where is the limit
> precisely ?)
>
> the former would be the best, the later is acceptable too the install
> phase might need some tweaking but that's reasonnable.
I was planning along the lines of 2) as this will bring the best
performance and also more freedom in the hand-made part.
> > An additional 12% speedup can be gained by not having a separate xmlCore
> > class (it is used only by xmlNode) but including the hand-made part
> > directly in definition of class xmlNode.
>
> That would make teh maintainance harder. The generation is automated,
> and I would like to keep it that way really.
>
> > with current libxml with added __iter_ it produces:
> >
> > >>> nodecount(ot)
> > nodes: 74957
> > time : 27.16
> > nodes/sec 2760.39
> >
> >
> > with my modifications (inc removing xmlCore) the result is
> >
> > >>> nodecount(ot)
> > nodes: 74957
> > time : 12.87
> > nodes/sec 5824.71
>
> Okay that look significant. but might be a side effect of the
> way your iterator is implemented, a C coded build of the node list
> might make that vanish, right ?
Probably yes. The iterator was just used as a test here to show that
attribute access using the new feature is 2x faster than using
__getattr__. If I had implemented the iterator using the get_xxx calls
there would have been no difference (except the __doc__ strings on
attributes ;)
> > I guess an order-of-magnitude speedup could be gained by moving the
> > whole libxml2.py to a C module. Are there any plans of doing so ?
>
> well there are a number of factors:
> - having the .py code with comments helps learning about the module
having doc-strings in right places could do the same ;)
And I think that while modifying generator to make all-C xmlNode et.al.
it should still make the all-python classes for those having problems
and also for testing.
> - can exactly the same API be provided by pure C code, I assume so
> but I have no idea on how to do it
It seems that with type/class unification in python 2.x it should be
possible.
I've planned to do something incremental that retains xmlNode._o as
PyCObject so that most (all ? :) non-C extension code will run.
This could also take care of free-ing the underlying c structures, even
though making it robust could need weak references (another python2.x
feature) to make it robust.
> - what would be the real speedup ?
Only testing will show. My gut feeling tells me that most of the time in
traversing is taken up in __init__'ing and __del__'ing of nodes.
> - how long would it take, the libxml2 module is in a very large part
> automatically generated, so generating C code need a change in the
> generator.
That'll be probably the fastest way to do it anyway ...
> - I really want to keep the generation automatic. I don't have time
> to change things in 2 places at a time, and I like the reliability
> of the generated code.
Sure.
> In a nutshell, it looks like a large effort, and I don't have the time
> for this ATM. You can have a look at it, maybe it's easy, maybe not, but
> I can't start the work myself, if you provide example code of converted classes
> maybe adapting the generator could be worked on quickly, but I have no
> guarantee at this point.
I'll try to make sample classes and/or tweaked generator for evaluation.
--
Hannu Krosing <hannu tm ee>
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]