Re: [xml] Walking tree without recursion

From: Stefan Behnel <stefan_ml behnel de>
To: xml gnome org
Subject: Re: [xml] Walking tree without recursion
Date: Fri, 25 Jun 2010 08:12:52 +0200

Michael Ludwig, 23.06.2010 23:29:

Oliver Kindernay schrieb am 23.06.2010 um 18:39 (+0200):

I am using libxml2 HTML 4.0 parser to parse HTML and XHTML web pages.
I want to found specific tags (i.e a), so I have to walk through the
tree of parsed document. And I don't want to use recursion like in
this example http://xmlsoft.org/examples/tree1.c. Is there some
mechanism in libxml which provides parsed nodes in some queue?


Sounds like you should be using a high-level approach such as XPath
or XSLT. Forgoing the benefits provided by these technologies is like
deliberately using flintstone to make fire.

Not necessarily. lxml.etree (Pythonic Python bindings for libxml2) has apair of macros for an iterative tree traversal loop. When I introduced it,it gave me a 30% speed-up compared to my original recursive traversal code,and it was almost 10% faster than plain XPath at the time. See thebench_lxml_xpath() and bench_lxml_getiterator() functions here:


http://codespeak.net/lxml/performance.html#a-longer-example

The code is near the end of this file (look for a long comment startingwith "depth first tree walker"):


http://codespeak.net/svn/lxml/trunk/src/lxml/etree_defs.h

These macros are the main reason why tree iteration is so blazingly fast inlxml.etree. Just look at these numbers:


http://codespeak.net/lxml/performance.html#tree-traversal

When searching for a specific tag (and when XML-ID is not an option), awell forged loop can be a lot faster than a generic XPath implementation.


Stefan

Follow-Ups:
- Re: [xml] Walking tree without recursion
  - From: Oliver Kindernay

References:
- [xml] Walking tree without recursion
  - From: Oliver Kindernay
- Re: [xml] Walking tree without recursion
  - From: Michael Ludwig

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]