Re: [guppi-list] Censored data in Goose



> A while ago, Asger was talking about how to handle censored data in a
> Goose DataSet.  We now return to that thread, already in progress...
 
> > 1) Add a boolean flag to every value.
> 
> > 2) Add an integer index value to every value.
> 
> > 3) Use my own datastructure.
>
> I agree that this is a drastic step.  It might be the best way to
> go... I'm not sure.  What I'd like to see is a solution that satisfies
> what you might call the "Stroustrup Doctrine" --- adding the feature
> should impose essentially no overhead on people who choose not to use
> that feature.
> 
> So maybe one could do something like (2), but only have the index be
> there for DataSets that have censored data.  In other cases, the index
> would just be implicit, and wouldn't be allocated.

Yes, that was what I was thinking.
 
> On the other hand, maybe the most logical thing to do would just be to
> make DataSet NAN-aware.  Have it do the right thing if you add() a
> NAN, add a valid() query.  Things might get a bit gross as we'd have
> to pepper the code with calls to valid() and isnan(), but at least we
> wouldn't be hopelessly cluttering DataSet's design.

This is a more clever variant on 1).  This avoids the extra bool, but the
problem is that it only works for floats, and we still impose a run-time
penalty.

> What do you think?

Having considered things a little more, I think we should ignore this in the
DataSet class.  Instead, the solution is to sub-class the DataSet and build
another set of access methods that deal with missing values in a class
CensorDataSet.

In this class, next to methods for getting the size, arrays and other stuff of
the censored data, there will be similar methods for getting these data from
the non-censored data.

This way, DataSet will remain as it is, and the new CensorDataSet will just
extend the existing class.  It will be able to do this efficient enough for my
use, while still preserving the constant time statistics.

What do you think about this?

Greets,

Asger



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]