Re: [xml] xpath evaluation timeout

From: Liam R E Quin <liam holoweb net>
To: Zhigang Chen <zhigangc gmail com>
Cc: xml gnome org
Subject: Re: [xml] xpath evaluation timeout
Date: Sat, 20 Oct 2012 14:17:09 -0400

On Thu, 2012-10-18 at 19:25 -0700, Zhigang Chen wrote:

Thanks Liam

We are building a platform to which codes containing xpaths are
submitted by external users. Manual optimization of xpaths are
infeasible. Do you know about any tools that can automate it?


Setting aside the security considerations, such as XPath's ability to
access external files... if the XML data is constant, I'd use XQuery
with an index - in C or C++ you could use dbxml or sedna, or if Java is
OK basex or qizx, or spring for a commercial XQuery implementation.

The difference in throughput is partly because the indexed
implementation doesn't need to process the XML each time, and partly
because the class of optimizations that can be done is larger.

XQuery is a superset of XPath.

Since I seem to say this quite often on this list I should say, I don't
have anything against libxml, it's awesome work and I use it myself,
too. But I use other tools as well, when I think they make more sense.
So take a look at the whole picture.

If you stay with libxml you could maybe work on the optimizer. As Daniel
mentioned, there was recently a patch that helped with performance. It
might also be possible to store the parsed data structures, or to write
an "xpath server" that reads xpath queries and runs them without
reloading documents. But libxml's optimizer doesn't build indexes to the
document, so there will still be some limits on performance.

If you have a new XML document with each XPath expression, XQuery
engines might be less of a help, although if the document is over a
megabyte or so (say), XQuery implementation that build an index on the
fly as the document is read will win out with some sorts of query and
maybe lose with others (because of the extra work in building the
index). It's the same with SQL - it's possible to write queries that
take hours to run on even a small database, and you can do the same with
XPath. So the timeout approach is probably part of a solution in any
case.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

References:
- [xml] xpath evaluation timeout
  - From: Zhigang Chen
- Re: [xml] xpath evaluation timeout
  - From: Liam R E Quin
- Re: [xml] xpath evaluation timeout
  - From: Zhigang Chen

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]