Re: [xml] Walking tree without recursion



Michael Ludwig, 23.06.2010 23:29:
Oliver Kindernay schrieb am 23.06.2010 um 18:39 (+0200):
I am using libxml2 HTML 4.0 parser to parse HTML and XHTML web pages.
I want to found specific tags (i.e a), so I have to walk through the
tree of parsed document. And I don't want to use recursion like in
this example http://xmlsoft.org/examples/tree1.c. Is there some
mechanism in libxml which provides parsed nodes in some queue?

Sounds like you should be using a high-level approach such as XPath
or XSLT. Forgoing the benefits provided by these technologies is like
deliberately using flintstone to make fire.

Not necessarily. lxml.etree (Pythonic Python bindings for libxml2) has a pair of macros for an iterative tree traversal loop. When I introduced it, it gave me a 30% speed-up compared to my original recursive traversal code, and it was almost 10% faster than plain XPath at the time. See the bench_lxml_xpath() and bench_lxml_getiterator() functions here:

http://codespeak.net/lxml/performance.html#a-longer-example

The code is near the end of this file (look for a long comment starting with "depth first tree walker"):

http://codespeak.net/svn/lxml/trunk/src/lxml/etree_defs.h

These macros are the main reason why tree iteration is so blazingly fast in lxml.etree. Just look at these numbers:

http://codespeak.net/lxml/performance.html#tree-traversal

When searching for a specific tag (and when XML-ID is not an option), a well forged loop can be a lot faster than a generic XPath implementation.

Stefan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]