Some Questions and Thanks!



First off I would like to personally thank John Nash and Jean Bréfort for helping me out getting Gnumeric compiled. The help was much appreciated and after a couple of days of halfhearted attempts I was finally able to get everything compiled this morning. Finally seeing Gnumeric load up from my SVN build was a great feeling. I've never dealt with a project that had so many dependencies and it has most definitely been an enlightening experience.

I have a couple of questions.

I am working on a project while I am finishing up here at university and we need to open large data sets. Now, I'm sure that this has been asked before as I have been looking into who exactly uses Gnumeric (a lot of researchers and statisticians who use R) but my data sets are in the order of 8-16GB. I have done a small test attempting to load a file that has 25 columns and 5,000,000 lines of randomized integer data and the file opened in a couple of seconds with a 65536 line limitation. All the columns are there.

I am merely looking to create a very quick very (ability to page through files fast, perform filters on files and split windows) and I was looking for some pointers. I noticed that Gnumeric loaded the whole file into memory even though it is only able to display 65536 lines. After poking around a little in the code I found that this seems to be able to be limited by a preprocessor definition, but as you can imagine, I can't load the whole file into memory with data sets approaching 16GB.

The original plan was to write a base GTK+ application that could quickly page through gigabytes of information. I know little about GTK+, so I figured that I would come ask some people that are experienced. Would my approach best be accomplished by using the built-in Grid widget? I already know that I will need to write my own CSV library in order to the size of the files that I want to handle (and the ability to page, seek, read, display, etc).  I was originally looking into hacking GTKsheet, but it seemed that I might be better off simply starting the project from scratch (or at the very least building my own CSV library).

Do you have any suggestions?

Much appreciated,
JB


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]