Re: [xml] How to extend the Reader interface with a dup() function



I need to scan ahead and then continue from the original point onwards.
The documents are -potentially- far to big to be stored in core.

What you want is not really dup() but seek() and tell() (to move back
and forth in the stream).

Basically, what should happen is this:

tell() returns a special structure which allows to seek back to
this position. For a file, it contains the internal structure
of the reader plus the position in the file.

For other streams, a memory buffer is created which gets filled.

seek() will the resume reading.

Exactly.

So basically, not something very hard to implement but
when you say "documents are -potentially- far to big",
then you will run out of memory when the document is
not on the local disk.

Maybe the "memory buffer" should create a file on disk when
it gets larger than 100KB?

Or even refuse to do it for the first version.

Or is there already an implementation of XML cursors as OSS?

Gnumeric uses something of that kind, I'll ask.

Or did you have something else in mind when you said
  "it't better to reuse the Reader for the same stream again."
?

Well, is it possible to scan tree fragments? If not,
you could work around this with a ... uh ... "simple" trick:

Create an stream object from which the XML reader can read.
The object should first return a fake XML header
(probably a copy of the <?xml of the document which
you're just reading plus all the DTDs).

Then it returns the element at which you started to
scan ahead until the point when you hit the closing element
and return EOF.

Now, you have a complete document which is only a fraction
of the original tree. Of course, this only works
if you scan ahead until the matching closing element.

Yes, but I like your first solution better: It just does what needed
to be done, no tricks or other complexity.

Regards,
  NaN.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]