Re: [xml] GSoC using libxml for indexing XML document



On Fri, Jun 03, 2011 at 05:34:03PM +0200, TomÃÅ PospÃÅil wrote:
Hi Daniel and all hackers,

I'm GSoC student creating new XML index in PostgreSQL which use LibXML for handling XML documents. My idea 
about index is about to use node offsets and Patricia Trie for mapping structural information to our 
internal representation tree index. So how can I get offset of nodes?

P.S. I already searched history and understand that libXml is not implemented for this kind of XML 
handling, but my use scenario is quite different that typical usage. Daniel suggested xmlByteConsumed and 
xmlTextReaderByteConsumed by I see that it's not exactly what I need.

  Well xmlByteConsumed is somehow about relating nodes to offset
of course things are often more complext due to the fact:
  - not all XML documents are made of one continuous stream (entity in
    XML speak), e.g. use of entities or XInclude.
  - libxml2 may convert encoding on the fly, and the encoder need
    to work on batch data to provide adequate performances, so at
    the parser level it's usually very hard to have precise offset
    from source

  if it doesn't do what you want, well you didn't defined precisely
what you wanted either "offset of nodes" can be interpreted in many
ways ...

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]