Re: [xml] How to speed up xpath eval with a lot of OR expression



Hi Michael,

Thanks for your comments.

I also thought about very much the same approach as you suggested and
that is working fine.

Regards.
-chang

On Fri, Jan 11, 2013 at 8:35 AM, Michael Ludwig <milu71 gmx de> wrote:
Probably too late, but anyway …

Chang Im schrieb am 28.11.2012 um 16:50 (-0800):

We are generating a xapth consisting of a lot of OR for selecting
child nodes based on the name.

  ex.  /config/group/[name = "n1" or name="n3" or name="n7" ....]

In one instance there are about 17500 "name = value" pairs in the
predicate.

This is at least 17480 too many, for my taste.

xmlXPathEvalExpression() of the above xpath takes a long time with
gprof data:

I'm not surprised. The query would take a long time to read for a
human. This is like machine-generated SQL, with all its drawbacks
and performance problems.

I would appreciate any suggestion on how to improve this xpath query
performance.

I saw reference to XQuery (indexed) and wondering if that is really
what I need.

I assume you're processing a huge document. In memory, it'll need about
ten times its size on disk. XQuery itself doesn't help this problem, but
it usually operates in databases featuring indexes for speedy lookup.
Which is what will help you with processing.

You could build up a hash table of all the 17500 strings you need to
match in memory. Then iterate over the document using the XML Reader
interface. On each /config/group, check whether its ./name matches
one of your strings. Then do what you want to do with the node, and
on to the next one.

I don't see a use case for XPath here, which has its strength in
flexibility and expressivity, which aren't features required by your
query.

Michael
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]