Re: [xml] libxml2 memory consumption



On Fri, Apr 19, 2002 at 03:12:12PM +0200, Henke, Markus wrote:


-----Original Message-----
From: Petr Tomasek [mailto:tomasek etf cuni cz]
Sent: Friday, April 19, 2002 2:59 PM
To: Henke, Markus
Subject: Re: [xml] libxml2 memory consumption


On Thu, Apr 18, 2002 at 01:27:31PM +0200, Henke, Markus wrote:
Is this the "normal" relation of document size/
memory consumption or is something wrong with

Yes it is. You need several variables to be stored for each node.

Well, that's clear. But a ratio of 1:12?

  Depends on the ratio of markup vs. data in your XML.

Doesn't it mean that parsing a document using the DOM like API
is impracticable for a document size > ~ 20MB (on a "average"
machine)?
I'd hoped that there is a way to reduce memory consumption...

  Use the SAX API. DOM uses lot of memory, it's a know fact.
Or discard the parts of the tree your don't need as you build them.

BTW, Daniel, if I understand it well, each string (e.g. element name) 
is stored each time. (I mean, let's say you have xml document with
10000 times <something/> element, so you have 10000 times "something"
string in memory). Maybe we could use hash tables while parsing the
document and leaving literary same strings in one location?

  that's 10000 x (9 bytes + your libc allocator data)
  i.e. 90 KBytes + ???

And, it comes with a serious price. It would also break the ABI/API
and makes code harder to understand and more expensive to run.
I made some initial testing and it wasn't looking like it was worth
the effort at that time.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]