Missing data and statistics



Hi!

I don't know of any conventions on how to handle this.
I propose that when you sort a set with missing data,
the missing data are put first.

As it is now, all the statistics inherited by a
holed dataset are exact mirrors of the underlying
ones.  I.e., when you ask for the mean of a dataset
with holes, this is exactly the same as asking the
question to a dataset where all the holes have been
deleted.
The notable exception is the "size" method.  This
returns the size of the set including the missing
elements.  I can change this, but I don't think
that would be a good idea.  Instead, I can add
a "count" method that will return the number of
non-missing elements.  

Thoughts?

Furthermore, I plan to add a new access method for
the holed data sets:

template<class Set>
class HoleSet {
  ...

  vector<Set::type const *> data() const;
}

that returns a vector of pointers to the original
type.  The missing elements will have the value 0.
The pointer is pointing to a const element, so
this method is only for inspection.

I.e for a HoledRealSet, we would return a
vector<double *>.

This method will be linear time, or worse depending
on the effeciency of the push_back method in vectors.
I.e. we will construct the vector on demand.

Greets,

Asger



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]