Re: [xml] XPath with SAX



On Thu, Mar 13, 2008 at 10:56:38PM +0100, Frantisek ZACEK wrote:
Hi,

I need to handle XPath in a application to select some specific nodes
from an XML tree.

The problem is, that the files involved might be quite large (>1GB) so
DOM parsing is not an option.

  Then you made a mistake allowing XPath to be used to query the files.
Basically someone did a design decision without understanding what it
meant, yes that's painful, there is only one solution to avoid this kind
of problem: learning before designing and implementing.

Now, libxml2 handles XPath with DOM, and from what I read in the
archives of this very mailing list, XPath support requires to know the
whole tree.
Why is that ? I mean; I did read the W3 recommandation about XPath,
and I still don't get it. It should be possible to support XPath with
a SAX parser, ... I mean, .. why not ? or at least an XPath support
that only needs a schema definition...


  Asking the question seems to indicate you either:
    - think only at the XPath queries *you* are interested into
    - don't understand yet the full expressiveness of XPath
in the first case, you ask the wrong question, in the second case I
suggest you re-read the XPath spec. BTW XPath 1.0 is unrelated to schemas,
but if you are interested in the topic i suggest you read the paper of
Layaïda and Co in PLDI07
   http://wam.inrialpes.fr/people/layaida/research/

Still, it is true that all uses I have found of XPath seemed to use
DOM which is unconceivable for too large files.

  Again i think you miss a lot of the expressiveness power of XPath when
making this assertion e.g.:
  //foo[last()]/preceding-sibling::bar
Evaluating an XPath is in gneral not possible in a single pass.
For single pass reduced lookup with a tiny support of XPath, there is
support in libxml2 http://xmlsoft.org/html/libxml-pattern.html
and the xmlReader also has the possibility to use XPath on subset of
the tree http://xmlsoft.org/xmlreader.html#Mixing

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]