Re: [guppi-list] Re: use of GSL in guppi

From: Jon Trowbridge <trow emccta com>
To: guppi-list gnome org
Subject: Re: [guppi-list] Re: use of GSL in guppi
Date: Tue, 15 Sep 1998 19:27:23 -0500
On Tue, Sep 15, 1998 at 11:45:59PM +0200, Asger Alstrup Nielsen wrote:
> Yes, it does look like an improvement.  Unfortunately, my gcc 2.7.2.something
> won't compile it properly so I can't test it, and hack on it.  I'm upgrading my
> system to the latest Debian tomorrow, and will install egcs 1.1.  Hopefully,
> that will help. 

Egcs 1.1 will most certainly do it for you.  

> This will probably be my development platform in the future, but we'll of
> course keep things portable to Unix (the biggest problem should be CR/LF
> stuff, but I'll work this out.)

Luckily that should only be an issue in the data import stuff, if at all.

> While we are at it, you can do this for Goose now:  The code that you have done
> can be LGPL as it used to, and the other stuff can be GPL until things resolve.
> This way, people would be able to take out the GPL stuff and get a clean LGPL
> library without waiting for a license settlement/rewrite.  The danger is that
> the process drags out (maybe because the R people are slow), more people
> contribute code to Goose under the GPL, and suddenly we are in a situation
> where the licensing is very hard to change.  I'd like to prevent that.

Well, if there is a sudden flood of GPLed contributions to Goose, I'll
have to take this issue up with the contributors.  But up until now,
that hasn't been a problem. :-)

I sent e-mail to the R people yesterday.  If I don't hear anything
from them by this weekend, I'm going to hit the library, track down
the original sources for these algorithms, and reimplement them.  Not
exactly my favorite way to spend a Saturday, but I don't want this
whole affair dragging out.  One way or another, I'd like to get the
LGPLed 0.0.3 released sometime next week.


> Enough of this boring license stuff for now.  Let's talk about Goose.
> 
> Do you have any ideas for what you would especially like be to
> implemented/improved in Goose now by somebody like me?  Before I and my
> statistical co-worker start on the non-parametric statistics stuff, we still
> have to make some design decisions for our own program, and in the meantime (it
> could be a few weeks, since this depends on other people as well), it would be
> a good idea for us to hack on Goose to get to know to code better, and maybe
> get the chance to change it a bit to meet our needs a little better.  I have
> had a look at it, and I think I would like to change some data structures in a
> few ways that shouldn't impact performance. How fixed do you
> consider these?

We're on version 0.0.2.  Nothing is fixed. :-)  Seriously, I guess it
depends on which data structures we are talking about...

> A really important thing is to add exception handling to everything to improve
> error
> handling (which, as you say, is practically non-existant at this point in
> time.)  I think this has high priority in order to shape all new code:  It
> should do error handling once the dust of introduction has settled.

Since good exception support hasn't really been available under g++ on
Intel, I've never really use exceptions and am therefore a total
exceptions idiot.  So if exception-based error handling is added to
Goose, I most certainly *shouldn't* be the one to do it.

But it sounds like a good idea.  My big approach to error handling up
until now has been:
  (1) Silently fail, and reset the data structure in question.  (For
      example, when applying log_transform() to a DataSet that
      contains non-positive values.)
  (2) Return a NAN.  (For example, when asking for the mean() of an
      empty DataSet.)  But who wants to check every return value w/
      isnan()?  Not me.
  (3) Crash.  (No examples leap to mind, but I'm sure that they are in there.)
 

> One other thing that is important is the addition of a "remove item" in the
> DataSet class.  On the surface, it should be pretty similar to add, just the
> other way around.  If this is added, the DataSet class is a fully dynamic class
> that allows interactive editing.

Should be easy enough to add.  I originally designed the DataSet to be
a bit opaque and "read-only", rather than just being a glorified
array.  But commands to remove an item or a range of items should be
painless to add and run in basically constant time.  (Except you do
have to sweep the whole array again if one of the items you remove ==
min_ or max_...)

> Another thing that would be useful for us, would be to able to sort
> corresponding to a DataSet.  The idea is to let one DataSet be the sorting key,
> and have many other DataSets depend on this. Each of these DataSets would
> implicitly be ordered the same, and when the key DataSet is sorted, the other
> datasets are shuffled around similarly in order to keep them synchronized. 
> I.e. to simulate a vector-DataSet.

Yeah, I've thought about this.  The way I had been planning to do this
was to introduce a Permutation object and add (at least) two functions
to DataSet:
  (1) apply_permutation() would take a permutation and re-order the
      DataSet accordingly.
  (2) permutation_sort() would sort the DataSet and then return the
      Permutation object corresponding to the transformation mapping
      the original (unsorted) data to the sorted state.  Then if you
      had N DataSets simulating a vector-DataSet, you could sort it
      keyed on the first one like this:
        vector<DataSet> vector_ds;
        ... initialize vector_ds ...
        Permutation perm = vector_ds[0].permutation_sort();
        for(int i=1; i<vector_ds.size(); ++i)
          vector_ds[i].apply_permutation(perm);
      This is nice for (at least) three reasons:
        (a) It gives us complete flexibility in how to represent
            "vector-DataSets", rather than forcing people to use a
            specific array-like thing.
        (b) This gives us an easy way to arbitrarily reorder
            "vector-DataSets", instead of just being limited to sorts.
        (c) We should avoid a lot of the gratuitous buffer-copies
            required by the usual approach to sorting vectors of
            n-tuples.

Implementing this should be pretty painless, but I just haven't gotten
around to it yet.

> I'm ready to have a look at these things (and others) according to your
> comments.

Look away.  I've been coding in a bit of a vacuum, and having other
people involved and looking at the code always helps.

-JT
Follow-Ups:
- Re: [guppi-list] Re: use of GSL in guppi
  - From: Asger K. Alstrup Nielsen
References:
- Re: [guppi-list] Re: use of GSL in guppi
  - From: Asger Alstrup Nielsen
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]