RE: [xml] libxml2 memory consumption




-----Original Message-----
From: Daniel Veillard [mailto:veillard redhat com]
Sent: Friday, April 19, 2002 3:34 PM
To: Henke, Markus
Cc: 'Petr Tomasek'; 'xml gnome org'
Subject: Re: [xml] libxml2 memory consumption


On Fri, Apr 19, 2002 at 03:12:12PM +0200, Henke, Markus wrote:


-----Original Message-----
From: Petr Tomasek [mailto:tomasek etf cuni cz]
Sent: Friday, April 19, 2002 2:59 PM
To: Henke, Markus
Subject: Re: [xml] libxml2 memory consumption


On Thu, Apr 18, 2002 at 01:27:31PM +0200, Henke, Markus wrote:
Is this the "normal" relation of document size/
memory consumption or is something wrong with

Yes it is. You need several variables to be stored for each node.

Well, that's clear. But a ratio of 1:12?

  Depends on the ratio of markup vs. data in your XML.

It was the libxml2 documentation (multiplied)  8)
 
Doesn't it mean that parsing a document using the DOM like API
is impracticable for a document size > ~ 20MB (on a "average"
machine)?
I'd hoped that there is a way to reduce memory consumption...

  Use the SAX API. DOM uses lot of memory, it's a know fact.
Or discard the parts of the tree your don't need as you build them.
 
Yep, that's an option in any case. For the moment the DOM API
will be OK, i don't expect (realistically) documents with a
size > 100KB.
The test was just to see where the comfort ends... :)


BTW, Daniel, if I understand it well, each string (e.g. 
element name) 
is stored each time. (I mean, let's say you have xml document with
10000 times <something/> element, so you have 10000 times 
"something"
string in memory). Maybe we could use hash tables while 
parsing the
document and leaving literary same strings in one location?

  that's 10000 x (9 bytes + your libc allocator data)
  i.e. 90 KBytes + ???

And, it comes with a serious price. It would also break the ABI/API
and makes code harder to understand and more expensive to run.
I made some initial testing and it wasn't looking like it was worth
the effort at that time.

Daniel


Thanx, Markus



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]