Performance reading/writing .gnumeric files
- From: Jody Goldberg <jody gnome org>
- To: gnumeric-list <gnumeric-list gnome org>
- Cc: Daniel Veillard <veillard redhat com>
- Subject: Performance reading/writing .gnumeric files
- Date: Sat, 19 Jan 2002 11:34:57 -0500
On Sat, Jan 19, 2002 at 07:21:30AM +0000, Nick Lamb wrote:
BUT this is import only and Gnumeric still bloats when you SAVE anything,
We're still using DOM for export which is even worse than DOM for
import becuase you need you use approx 4x the memory to store the
DOM tree, then double that to dump the tree into a buffer which is
then stored. That is why we will be moving to a printf based
solution for xml export.
plus why does the non-SAX loader bloat Gnumeric permanently? I'm sure
Jody said that Gnumeric was leak free, didn't he? Also, the SAX loader
is smaller but it's no faster, still takes 4 minutes to load 5000 lines.
Thanks for Morten's dilgence we are leak free on all code paths that
we've tested. In this case I'd bet that gnumeric has freed the
memory, but glibc has not shrunk the process.
No matter, the XLS file created from these .gnumeric files is corrupt,
crashing both Gnumeric and MS Excel. So we have to try something else...
Gnumeric (and/or libole2) is trashing these files during export. Please
have a look in Bugzilla and see if there is already a bug for that.
Hmm, this is serious. My worry is that we are hitting a known bug
in libole2. It does not handle files larger than about 7 meg
correctly. I'll have a look.
The resulting XLS files load quickly and efficiently into Gnumeric,
don't bloat it up and are generally much nicer. Maybe we should adopt
this ".XLS" format as the native format of Gnumeric ? ;)
The difference is almost certainly binary vs xml loading. Loading
an int from a file then bit bashing it is going to be faster than
reading & parsing WrapText="true".
gprof and eazel's prof are note helping pin point where the speed is
going. We can easily manage 5000x20 cells in gnumeric. The only
hint I've got so far was from a slight change in the xml format I
tested. We currently store things as
<Cell attr="" attr="" ...>
That Content node is irrelevant. When I loaded up one of JP's test
files and resaved it as
<Cell attr="" attr="" ...>=foo()</Cell>
The file shrank from 2.5 Meg of uncompressed xml to 1.2 meg of xml.
Loading was then somewhat faster 25 seconds vs 29 seconds for a full
code start and exit. With luck the sax importer will buy us at
least that level of improvement again. From there we'll need to
figure out where the bottleneck is to go further. A comperably
sized xls would load in 4-5 seconds on the same machine.
] [Thread Prev