guppi/goose vs. R, Fiasco, etc.



The design goals of goose and guppi are fairly different than those of
R and similar programs (like Fiasco, the GNU SPSS clone described at
http://www.gnu.org/software/fiasco).

Guppi is intended to be primarily a plotting and visualization
program, a sort of Gnuplot-on-steroids.  It isn't meant to be a
full-featured statistical analysis package, like the others mentioned
above.  However, I do want for it to include a useful (and extensible)
set of statistical functions that can be quickly and easily accessed
by beginners.  Things like descriptive statistics and regressions
should be just a few mouse-clicks away.  Programs like R and Fiasco
are great, but they are "heavyweight" statistical programs with
difficult learning curves.  Guppi will concentrate on "lightweight"
statistics, and will (hopefully) be an nice tool that lets people with
less lofty statistical requirements do lots of non-trivial things
quickly and easily.

We're also planning to use some sort of a plug-in architecture for
statistics with guppi, so that you can add functionality that we left
out.

Goose exists for two reasons:

(1) For the sake of modularity, it made sense to put all of guppi's
    statistical processing functionality into a separate library.

(2) In the course of my "real work", I write lots of C++ code that
    needs to have statistical stuff built into it.  My performance
    requirements are pretty high, so I can't get away with calling
    external apps.  Because I've never been systematic about this
    stuff in the past, I always seem to be re-writing the same code.
    Right now, I'm moving statistical code that I've written in the
    past into goose and will be making use of it in my "real work" in
    the future.

The main design goal of goose is to provide a very clean and simple
C++ interface to a set of good statistical routines.  IMHO, lots of
the statistical code that is floating around out that suffers from
"Fortran-itis".  There is nothing wrong with that, but this is C++,
not fortran.  I don't mind taking a small performance hit if it makes
the library a lot easier to use... and if you set things up carefully,
you don't even take that much of a performance hit.

I certainly don't want to "reinvent the wheel for statistical
computing", but I think that my wheel needs are sufficiently different
that it makes sense to have a new library.  And anyway, in the spirit
of Free Software, I'm hoping to keep the wheel reinvention to a
minimum by stealing lots of code for goose from the R and Fiasco
projects.  The main barrier right now is a license issue: goose is
LGPLed, and I'd kind of like to keep it that way.  Both R and Fiasco
are GPLed.  So I'm going to have to contact the appropriate people and
see if we can't work something out... but I haven't gotten around to
that yet.

-Jon Trowbridge
 trow@emccta.com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]