Fwd: transpose on bigger datasets?




---------- Forwarded message ----------
From: Jim Tarvid <tarvid ls net>
Date: Wed, Oct 29, 2014 at 11:15 AM
Subject: Re: transpose on bigger datasets?
To: Morten Welinder <mortenw gnome org>


About 4 or 5 years ago I processed three years of monthly price quantity data from ~300 clinics of 60 or so products and services. I found using python (and in one case - php) useful in organizing csv (and tdf)  files for use in gnumeric, pspp and r. Trying to do this with one tool might have been interesting.

python with numpy, xlrd and csv worked to produce subsets of the data.

On Wed, Oct 29, 2014 at 10:35 AM, Morten Welinder <mortenw gnome org> wrote:
> Why a max at all?

There are issues with the language used for expressions.  When the column
number gets high enough, there will be ambiguities such as whether "TRUE"
is a column name or a constant; "LOG2" might be a function or it might be a
cell name; etc.  I am not sure all of these are well understood.

There is no simular problem with row numbers, but we do need to keep area
sizes from overflowing the "int" type, typically 2^31.  There is no problem with
increasing the number of rows, but right now it would have to come with a
limiting of the number of columns.

I don't think it is impossible to lift these restrictions.  It is
unknown, however,
whether the existing data structures will exhibit unpleasant performance
complexity behaviour for large sheets.  And messing with the low-level data
structures used for spreadsheets is very difficult.  Correctness is not the
hardest thing, performance tradeoff is.

Memory usage is also going to be an issue.  You may assume something
like 200 bytes per non-blank cell.  10^8 * 10^8 * 200 is a mighty big number!

Morten


On Wed, Oct 29, 2014 at 9:22 AM, Berntsson, Martin
<martin berntsson eon se> wrote:
> Dear Morten and all,
>
> Thanks for the file and the advice.
>
> Thoughts about Gnumeric :
> Why are there different max rows and max column?
> Why a max at all? And not limited by machine mem?
> I need a lot of both, and seen from a technical perspective, today
> there are numerous applications that produce many variables (columns) (like 10^8).
>
> Perhaps that could be a suggestion for the next version? - I think there is a general need for that,
> and for me the 1.6x10^7 rows are about to run out.
> I don’t think there are any other software out there that can be used for this.
>
> Thanks anyhow for a great software!
> (probably the best for bigger datasets!)
>
>
> Med vänliga hälsningar / With kind regards / Freundliche Grüße,
>
> Martin Berntsson  DS.
>
> _______________________________________________
> gnumeric-list mailing list
> gnumeric-list gnome org
> https://mail.gnome.org/mailman/listinfo/gnumeric-list
_______________________________________________
gnumeric-list mailing list
gnumeric-list gnome org
https://mail.gnome.org/mailman/listinfo/gnumeric-list



--

Kindness Works!
Jim Tarvid
12897A Grays Pointe Road, Fairfax, Va 22033-2143
703-657-0099 Condo
703-825-8463 Cabin
703-624-5289 Cell
http://ls.net



--

Kindness Works!
Jim Tarvid
12897A Grays Pointe Road, Fairfax, Va 22033-2143
703-657-0099 Condo
703-825-8463 Cabin
703-624-5289 Cell
http://ls.net


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]