Re: [xml] Support for really large XML documents



On Wed, May 23, 2012 at 05:55:01PM +0200, Vit Zikmund wrote:
Greetings libxml gurus!
We are using XMLSec library built on top of libxml2 to process some large 
XML files, however it doesn't seem to work for files >2GB, which is 
unfortunately what we need.

I'd like to ask if the library should support processing that large files 
(otherwise, this might be a bug).

  libxml2 certainly parses files larged than 2GB, I have tested with
files larger than 4GB to make sure we had no 32 bits limitations on
input.

It seems there's a limitation in the struct _xmlOutputBuffer, that stores 
written bytes in a signed int - therefore the max limit is 2GB.
Here it is: 
http://git.gnome.org/browse/libxml2/tree/include/libxml/xmlIO.h#n141

  Then I would guess the _xmlOutputBuffer was created to output in
  memory which is the worse situation, because usuall xmlOutputBuffer
have a set of I/O routines associated and those are called to evacuate
progressively the output data, we should never accumulate 2G of output
in memory !

We'd really like if the library could support 64 bit sizes and I see the 
struct _xmlParserInputBuffer, that's nearby, does. It uses unsigned long 
that's 64bit for x86_64 architecture, we are building for.
It might really help us if someone here could know what else will need to 
be fixed for the whole thing to work. If it's going to be a patch or a 
full scale project.

  Make sure first that you are not dumping to a memory buffer then
if the problem persists we will try to fix things. So how was the
xmlOutputBuffer allocated ?

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]