[xml] DOM vs xmlReader



I am at a point where I am investigating the usage for xmlReader as I have a requirement of parsing large xml file that cannot be held in memory 

which is why I had a few questions about this.

One of the other things that I have in my environment are that I need to use IO based reading ( as compared to file or memory based reading) 
Due to this i was am using the Push parser mechanism .

Below is the pseudo code for my parsing logic using DOM 
Current DOM  pseudo - code for Parsing using DOM , note that I can only read 1024 bytes from the stream as part of one read operation 

IOObject  mStream;
xmlParserCtxtPtr ctxt;
char pchar[1024];
int res = mStream.ReadBytes(pChar , 4);
if(res)
{
xmlParserCtxtPtr ctxt=   xmlCreatePushParserCtxt(NULL,NULL,
(const char *)pChar, res,NULL);
res = 0;
do 
{
res = mStream.ReadBytes(pChar,fileSizeBytes);
if(res)
errCode = xmlParseChunk(ctxt, (const char *)pChar, res, 0);
} while(res>0 && errCode==XML_ERR_OK);

if( errCode == XML_ERR_OK)
{
xmlParseChunk(ctxt, (const char *)pChar, res, 1))
// Do something , set output 
}
xmlFreeParserCtxt(ctxt);
}


Proposed xmlReader code 
IOObject  mStream;
char pchar[1024]; 
xmlTextReaderPtr reader = xmlReaderForIO (<params( callback functions for IO )>);
while(xmlTextReaderRead(reader))
{
// Do something with this node 

}
xmlFreeTextReader(reader);
My question is that , given this scenario , does xmlreader still save me memory ,compared to DOM  ( in terms of storage allocated for parsing xml) as bytes from the stream would need to still 
be cached to decipher node information. 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]