Re: [xml] Support for really large XML documents
- From: Daniel Veillard <veillard redhat com>
- To: Vit Zikmund <vit_zikmund cz ibm com>
- Cc: xml gnome org
- Subject: Re: [xml] Support for really large XML documents
- Date: Thu, 24 May 2012 11:36:47 +0800
On Wed, May 23, 2012 at 05:55:01PM +0200, Vit Zikmund wrote:
Greetings libxml gurus!
We are using XMLSec library built on top of libxml2 to process some large
XML files, however it doesn't seem to work for files >2GB, which is
unfortunately what we need.
I'd like to ask if the library should support processing that large files
(otherwise, this might be a bug).
libxml2 certainly parses files larged than 2GB, I have tested with
files larger than 4GB to make sure we had no 32 bits limitations on
input.
It seems there's a limitation in the struct _xmlOutputBuffer, that stores
written bytes in a signed int - therefore the max limit is 2GB.
Here it is:
http://git.gnome.org/browse/libxml2/tree/include/libxml/xmlIO.h#n141
Then I would guess the _xmlOutputBuffer was created to output in
memory which is the worse situation, because usuall xmlOutputBuffer
have a set of I/O routines associated and those are called to evacuate
progressively the output data, we should never accumulate 2G of output
in memory !
We'd really like if the library could support 64 bit sizes and I see the
struct _xmlParserInputBuffer, that's nearby, does. It uses unsigned long
that's 64bit for x86_64 architecture, we are building for.
It might really help us if someone here could know what else will need to
be fixed for the whole thing to work. If it's going to be a patch or a
full scale project.
Make sure first that you are not dumping to a memory buffer then
if the problem persists we will try to fix things. So how was the
xmlOutputBuffer allocated ?
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]