Observations



On occasion I load large tab delimited ASCII files ( > 5M) into Excel or
Gnumeric and have noticed the following.

1) Gnumeric is SLOOOW compared to excel on the same machine (dual boot)
2) If a column is selected (click on header) and filled (cut & Pasted
formula), the fill goes all the way to row 65535.  The spreadsheet then
slows dramatically as it seems to always want to calculate all 65000
rows, even though only the first few thousand rows may have real data. 
Excel appears (?) to use "virtual" rows after the real data to avoid
this problem - hope this makes sense!  This can be fixed by selecting
all rows after real data and deleting them - a kludge I think.
3) Related to 1 & 2, Gnumeric uses a lot of memory, to the point where I
had a system crash when all 256M memory and 512M swap filled when trying
to save a > 5Mbyte tab delimited ASCII file as a ".gnumeric" type -
again excel just worked as expected (slow, not crash, that is!)
4) Gnumeric's import feature stumbles over embedded control characters
such as NULL and \x01 - excel just loads them and goes on.  I use a
simple Perl script to clean the data now, but a more flexible import
routine with a filter function may benefit some people (or the ability
to select and filter through the script as a plug in? - now there's a
neat idea!).  I have no control over how the data is created and just
get handed a file from an unknown source, often without even knowing
what program created it, so every task is a potential "adventure"
5) Being able to select "Treat two delimiters as one" (something like
that) in gnumeric is not as flexible or useful as being able to "treat
multiple delimiters as one" in excel.

These are observations and perhaps the most important for me is that at
least now (as of the last few versions) I can use gnumeric to view such
files instead of having to boot win! Seriously, I think that the ability
to import through a custom filter (such as selectable filters, including
the ability to select homebrew perl scripts working to stdin/stdout)
would be a real benefit for specialised import problems.  Once a filter
is written, an regular user could then just follow the instructions.

Keep it up!
BillK








[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]