On Sun, Mar 11, 2001 at 12:09:43PM +0000, Matt Sergeant wrote:
I tried to parse a 54M XML file today (the gcide websters dictionary) to
see what would happen. When it started to go into swap I had to kill it...

MSXML uses flyweights [1] to reduce the size of thier DOM. They have some
documentation on MSDN about how a MSXML DOM is generally about 0.9 to 1.1
times the size of the original XML. I believe they also get a performance
boost from this.

Any thoughts on using the flyweight design pattern in libxml?

[1] A well known design pattern, see

  Yep, memory requirement is one of the limitation of libxml.
Libxml was designed initially as a document editing toolkit,
i.e. being able to save back everything and not optimized for
large data sets.
Changing those part would result in binary incompatible changes,
probably libxml3 when I manage to schedule such a beast.


