Re: [xml] faster replacement libxml.py for python 2.2



On Sat, Jan 04, 2003 at 05:08:06PM +0500, Hannu Krosing wrote:

I have modified libxml.py (the handmade part of libxml2.py) to make 
use of some new features of python 2.2 , namely

1) replaced __gettattr__() with property()

2) added an __iter__() method that returns an iterator over subtree of
node

  is 2) dependant on 2.x features ?

The result of replacing __gettattr__() with property() is a 2x speedup
when iterating over tree using node.next and node.children

  Well, well ... the problem is that the libxml2 module so far was completely
independant of the version of Python used, i.e. it would compile and
run as-is with Python 1.5 which is still deployed on a lot of machines ...
  There is 2 ways to integrate your speedup:
    - either try to make the code generic, i.e. workable on 1.5
      setups
    - or generate different .py modules for the new and the old
      version (will your code work with 2.1 ? where is the limit
      precisely ?)

  the former would be the best, the later is acceptable too the install
phase might need some tweaking but that's reasonnable.

An additional 12% speedup can be gained by not having a separate xmlCore
class (it is used only by xmlNode) but including the hand-made part
directly in definition of class xmlNode.

  That would make teh maintainance harder. The generation is automated,
and I would like to keep it that way really.

with current libxml with added __iter_ it produces:

nodecount(ot)
nodes: 74957
time : 27.16
nodes/sec 2760.39


with my modifications (inc removing xmlCore) the result is

nodecount(ot)
nodes: 74957
time : 12.87
nodes/sec 5824.71

  Okay that look significant. but might be a side effect of the 
way your iterator is implemented, a C coded build of the node list
might make that vanish, right ?

I have some questions about the code though -

1) why does current code have duplicate defs of doc(self) ?

   simple oversight, fixed.

2) what is the alternate name getContent for get_content used for ?
   do some tests depend on it ?

   I think I kept both because soem early test needed them, since it's part
of the published API I see no reason to remove it now honnestly.

3) should nodeWrap(o) return an xmlAttr for type "dtd" ?
  I changed my code to return an xmlDtd, is it the right thing to do ?

    Right cut'n paste bug, thanks fixed :-)

I guess an order-of-magnitude speedup could be gained by moving the
whole libxml2.py to a C module. Are there any plans of doing so ?

  well there are a number of factors:
    - having the .py code with comments helps learning about the module
    - can exactly the same API be provided by pure C code, I assume so
      but I have no idea on how to do it
    - what would be the real speedup ?
    - how long would it take, the libxml2 module is in a very large part
      automatically generated, so generating C code need a change in the
      generator.
    - I really want to keep the generation automatic. I don't have time
      to change things in 2 places at a time, and I like the reliability
      of the generated code.

  In a nutshell, it looks like a large effort, and I don't have the time
for this ATM. You can have a look at it, maybe it's easy, maybe not, but
I can't start the work myself, if you provide example code of converted classes
maybe adapting the generator could be worked on quickly, but I have no
guarantee at this point.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]