Re: [xml] Walking tree without recursion



Thanks for this. I've studied some algorhitms and depth-first search
looks as it's what I want.

2010/6/25 Stefan Behnel <stefan_ml behnel de>:
Michael Ludwig, 23.06.2010 23:29:

Oliver Kindernay schrieb am 23.06.2010 um 18:39 (+0200):

I am using libxml2 HTML 4.0 parser to parse HTML and XHTML web pages.
I want to found specific tags (i.e a), so I have to walk through the
tree of parsed document. And I don't want to use recursion like in
this example http://xmlsoft.org/examples/tree1.c. Is there some
mechanism in libxml which provides parsed nodes in some queue?

Sounds like you should be using a high-level approach such as XPath
or XSLT. Forgoing the benefits provided by these technologies is like
deliberately using flintstone to make fire.

Not necessarily. lxml.etree (Pythonic Python bindings for libxml2) has a
pair of macros for an iterative tree traversal loop. When I introduced it,
it gave me a 30% speed-up compared to my original recursive traversal code,
and it was almost 10% faster than plain XPath at the time. See the
bench_lxml_xpath() and bench_lxml_getiterator() functions here:

http://codespeak.net/lxml/performance.html#a-longer-example

The code is near the end of this file (look for a long comment starting with
"depth first tree walker"):

http://codespeak.net/svn/lxml/trunk/src/lxml/etree_defs.h

These macros are the main reason why tree iteration is so blazingly fast in
lxml.etree. Just look at these numbers:

http://codespeak.net/lxml/performance.html#tree-traversal

When searching for a specific tag (and when XML-ID is not an option), a well
forged loop can be a lot faster than a generic XPath implementation.

Stefan
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]