Re: [xml] (no subject)



On Tue, May 13, 2003 at 04:40:53PM -0400, Josiah Johnston wrote:
I am implementing an XML file standard for microscope data. ( 
http://openmicroscopy.org/ ) One issue I've been struggling with is 
extraction of pixel data. Pixel data is very large (500 MB is not 
unusual) and can be embedded in the XML document. I do not want this to 
be loaded into RAM.

I am looking for a method available using some component of libxml to 
extract this massive data without loading it into RAM. I've been 
playing with embedding the pixel data in a CDATA section and using a 
little c program to preprocess the file, piping the pixel data to a 
file, and replacing it with a file path. Using <![CDATA[...]]> as a 
file position marker is not an optimal solution.

Does anyone have suggestions for dealing with this? I'd rather use an 
existing XML tool and stick to standard methods of dealing with this 
than develop my own method of addressing this problem. Unfortunately, I 
haven't been able to find any threads that deal with gigantic XML 
nodes. All the memory issues I've seen address gigantic documents.

  Well if your data are splitted onto decently sized XML nodes, which
together may make up a large document but if single nodes can be loaded
in memory then use the xmlReader streaming API, either in C or Python,
see doc at:
     http://xmlsoft.org/xmlreader.html
For the record I parsed and validated a 4.5GB xml instance with libxml2-4.5.7

  If on the other hand your basic data block is a 500MB single chunk of data
just encapsulated into a single CDATA section, then libxml2 won't be able to
process it without loading the full node, but anyway I doubt XML is then
a decent approach, or at least not without splitting it into decently sized
blocks and with an appropriate markup.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]