Re: [xml] Support for really large XML documents



Hi Daniel, thanks for your reply!

Well, you are right with the buffer writing to memory and the author of the XMLSec library confirmed that he has to have the whole document there due to c14n. Also it seems that it is a fundamental part of the process, so there is no easy fix on his side.
http://www.aleksey.com/pipermail/xmlsec/2012/009411.html

As you say the data should be evacuated progressively and the buffer should not ever be that big, I must ask again if we understand each other here. I've been debugging the process whole yesterday and I can see the write callbacks are consistently writing chunks about 4KB, so no big buffering occurs per se. What seems odd is that all those small writes are tracked by the buffer struct I mentioned before and that has an 'int' counter that would IMHO overflow with 2GB+ large output no matter the destination is a file or memory.

Please, check here - the place of the overflow:
http://git.gnome.org/browse/libxml2/tree/xmlIO.c#n3445
Correct me if I'm wrong, but the value of out->written seems like a total of all the small writes related to the final output. Actually, nothing wrong seems to happen when this counter overflows to negative, until you later call xmlOutputBufferClose() which on success returns this counter value, that is now negative, but it's not an error code. This is definitely at least interesting, don't you think? ;)

Thanks!
Vit

PS: When I add a simple hack that guards the counter to never become negative the XML verification starts working, but the value of the counter becomes useless. Also this means other logic seems to be correct for large files, that is good news.
BTW, the error manifests here in the XMLSec code:
http://git.gnome.org/browse/xmlsec/tree/src/c14n.c#n277

Daniel Veillard <veillard redhat com> wrote on 05/24/2012 05:36:47 AM:
>

> On Wed, May 23, 2012 at 05:55:01PM +0200, Vit Zikmund wrote:
> > Greetings libxml gurus!
> > We are using XMLSec library built on top of libxml2 to process some large
> > XML files, however it doesn't seem to work for files >2GB, which is
> > unfortunately what we need.
> >
> > I'd like to ask if the library should support processing that large files
> > (otherwise, this might be a bug).
>
>   libxml2 certainly parses files larged than 2GB, I have tested with
> files larger than 4GB to make sure we had no 32 bits limitations on
> input.

>
> > It seems there's a limitation in the struct _xmlOutputBuffer, that stores
> > written bytes in a signed int - therefore the max limit is 2GB.
> > Here it is:
> >
http://git.gnome.org/browse/libxml2/tree/include/libxml/xmlIO.h#n141
>
>   Then I would guess the _xmlOutputBuffer was created to output in
>   memory which is the worse situation, because usuall xmlOutputBuffer
> have a set of I/O routines associated and those are called to evacuate
> progressively the output data, we should never accumulate 2G of output
> in memory !
>
> > We'd really like if the library could support 64 bit sizes and I see the
> > struct _xmlParserInputBuffer, that's nearby, does. It uses unsigned long
> > that's 64bit for x86_64 architecture, we are building for.
> > It might really help us if someone here could know what else will need to
> > be fixed for the whole thing to work. If it's going to be a patch or a
> > full scale project.
>
>   Make sure first that you are not dumping to a memory buffer then
> if the problem persists we will try to fix things. So how was the
> xmlOutputBuffer allocated ?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]