Re: [xml] xmlReader: Possibility for cloning an xmlTextReader (or multi-pass reading)



On 30.03.2013 16:35, Daniel Veillard wrote:
On Sat, Mar 30, 2013 at 08:02:38AM +0100, Martin B. wrote:
...

It turns out however, that the subtree where the large data resides
has to be read not in-order, but I have to collect some (small
amount of) data before the other. (And the problem is exactly that
it is this subtree that contains the large volume of data, so
loading only this subtree into memory doesn't make much sense
either.)

The easiest thing would be to just "clone" / "copy" my current
reader, read ahead and then return to the original instance to
continue reading there.

There doesn't appear to be any way however to "copy" the state of an
xmlTextReader.

   The problem is that XML parsing is really defined as a sequential
operation. You can't really go backward or start only from a given
'index'. For cloning from a given point and continuing, the problem
is the I/O model. The parser can read from a filedescriptor or even
from a constructed I/O made of a set of callback functions. The only
way to do this would mean to keep all the input data processed from that
point until it gets consumed by the cloned parser. In most case though
the size of the data fed to the parser is nearly an order of magnitude
less than the memory used by the equivalent tree (depends a lot how
is your tree !) so that may still be a gain.
   But by definition of parsing, the cloned will still have to go
though all the data from the cloning point, and the core of the issue
is that you can't always clone an I/O path.
   IMHO if you're processing from a file, just reparse, parsing
can be extremely fast if you don't need to allocate a tree or data
as you go.


Thanks for the explanation although I fear I can't fully follow all the details.

I guess I can accept that it just doesn't work with the implementation there is at the moment, and obviously, if you stream data in from a "read once" source, it gets messy.

But as far as I can grasp this, an xmlTestReader is just an "object", encapsulating access to an XML document, and in the case of a local file it a set of attributes that made up the parser state and a byte offset into the file. Copying such an "object" should be pretty trivial in priciple, or shouldn't it? (You write above: "can't always clone an I/O path" -- for reading from a named file, it should be trivial though, shouldn't it?)

If I can't re-read part of a file, I could also re-read the whole
file, ...

Is there maybe a simple way to remember for a xmlTextReader where it
is in the current document, ...

Hum, no, ...
You will iterate on the Read() though, assuming you don't do other
progress operations, just count them, and then when going through the
second time run a loop with the same number of Read() you should be
at the same place if the input didn't changed !


Just counting the reads sound like an excellent idea! Thanks!

cheers,
Martin


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]