Re: [xml] mixed parsing using libxml




On Wed, Nov 19, 2003 at 12:58:12PM +0000, Philip Couch wrote:
It seems to me that one solution would be to use a mixture of SAX and
DOM. Is it straight forward to use SAX to extract part of the XML
document that I am interested in, and then build a DOM from this? And if
so, do you have any pointers to reference material?

Thanks very much in advance for any suggestions that you may have!

  Look at the xmlReader interface. Especially the last part
    http://xmlsoft.org/xmlreader.html#Mixing

Daniel


Another way of solving the problem using "extended DOM":

The "Buckau Reference node (BRef)"  -TAADAAAAA  :D

The BRef node is a node type replacing a text node by the tree builder when
some user provided criteria is met (typically element content length > n).
Perhaps it would be even better to add a textnode like "BRef [n]" and add
the actual BRef node as a sibling instead of just replacing the expected
textnode.

In it's most simple form the BRef node would contain the offset into the xml
file and the data length. Most likely, when you are dealing with these kind
of documents you would be interested in using the libz decompression
provided by libxml. An optimized implementation in this case would require a
little bit more work. You store the zlib state with the BRef node to avoid
having to "fast-forward" the file through zlib when you want to read the
contents. This is of course mostly relevant if you store more than one large
element in your document and want to read element n > 1.

I will likely implement this myself in a week or so for use in a current
project (the zlib state optimization only if time permits since I will only
have one "blob" per doc).

What are your thought on all of the above? Is there something similar
already available that I can use instead of implementing this myself?


On a related topic. Does anyone know a schema compiler (preferably c/c++
output) that produces code to "skip" large elements on parsing for "semi
manual" retrieval?


Patrik Buckau




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]