Re: [xml] How to extend the Reader interface with a dup() function

On Sun, Nov 02, 2003 at 10:54:01PM +0100, Nanning Buitenhuis wrote:

I want to extend the Reader interface with a dup() function.
That is, make a new reader with the same state as the current
reader, so a trial run over the data is possible.

Possible, impossible, or pointers?

  Would require special code in the xmlreader.c . Seems really hard
anyway for example if you feed the data from a socket. So In general
it sounds like too hard to make reliable and it't better to reuse 
the Reader for the same stream again.


That would make the calling code overly complex.
In certain circumstances (XSL-FO parsing) I need to scan ahead and
then continue from the original point onwards. The documents are
-potentially- far to big to be stored in core, and rescanning to
my current position would be ugly.

I would not mind at all if the dup() would only work for files and
return NULL for other input streams. Later, perhaps, I could extend
it to other streams.

What you want is not really dup() but seek() and tell() (to move back
and forth in the stream).

Basically, what should happen is this:

tell() returns a special structure which allows to seek back to
this position. For a file, it contains the internal structure
of the reader plus the position in the file.

For other streams, a memory buffer is created which gets filled.

seek() will the resume reading.

So basically, not something very hard to implement but
when you say "documents are -potentially- far to big",
then you will run out of memory when the document is
not on the local disk.

Maybe the "memory buffer" should create a file on disk when
it gets larger than 100KB?

Or is there already an implementation of XML cursors as OSS?

Or did you have something else in mind when you said
  "it't better to reuse the Reader for the same stream again."

Well, is it possible to scan tree fragments? If not,
you could work around this with a ... uh ... "simple" trick:

Create an stream object from which the XML reader can read.
The object should first return a fake XML header
(probably a copy of the <?xml of the document which
you're just reading plus all the DTDs).

Then it returns the element at which you started to
scan ahead until the point when you hit the closing element
and return EOF.

Now, you have a complete document which is only a fraction
of the original tree. Of course, this only works
if you scan ahead until the matching closing element.

Aaron "Optimizer" Digulla a.k.a. Philmann Dark
"It's not the universe that's limited, it's our imagination.
Follow me and I'll show you something beyond the limits."

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]