Re: [xml] libxml2 memory consumption
- From: Daniel Veillard <veillard redhat com>
- To: "Henke, Markus" <Markus_Henke ordat com>
- Cc: "'Petr Tomasek'" <tomasek etf cuni cz>, "'xml gnome org'" <xml gnome org>
- Subject: Re: [xml] libxml2 memory consumption
- Date: Fri, 19 Apr 2002 09:33:41 -0400
On Fri, Apr 19, 2002 at 03:12:12PM +0200, Henke, Markus wrote:
-----Original Message-----
From: Petr Tomasek [mailto:tomasek etf cuni cz]
Sent: Friday, April 19, 2002 2:59 PM
To: Henke, Markus
Subject: Re: [xml] libxml2 memory consumption
On Thu, Apr 18, 2002 at 01:27:31PM +0200, Henke, Markus wrote:
Is this the "normal" relation of document size/
memory consumption or is something wrong with
Yes it is. You need several variables to be stored for each node.
Well, that's clear. But a ratio of 1:12?
Depends on the ratio of markup vs. data in your XML.
Doesn't it mean that parsing a document using the DOM like API
is impracticable for a document size > ~ 20MB (on a "average"
machine)?
I'd hoped that there is a way to reduce memory consumption...
Use the SAX API. DOM uses lot of memory, it's a know fact.
Or discard the parts of the tree your don't need as you build them.
BTW, Daniel, if I understand it well, each string (e.g. element name)
is stored each time. (I mean, let's say you have xml document with
10000 times <something/> element, so you have 10000 times "something"
string in memory). Maybe we could use hash tables while parsing the
document and leaving literary same strings in one location?
that's 10000 x (9 bytes + your libc allocator data)
i.e. 90 KBytes + ???
And, it comes with a serious price. It would also break the ABI/API
and makes code harder to understand and more expensive to run.
I made some initial testing and it wasn't looking like it was worth
the effort at that time.
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]