Missing data and statistics
- From: "Asger K. Alstrup Nielsen" <alstrup diku dk>
- To: guppi-list gnome org
- Subject: Missing data and statistics
- Date: Thu, 11 Feb 1999 13:35:51 +0100 (MET)
Hi!
I don't know of any conventions on how to handle this.
I propose that when you sort a set with missing data,
the missing data are put first.
As it is now, all the statistics inherited by a
holed dataset are exact mirrors of the underlying
ones. I.e., when you ask for the mean of a dataset
with holes, this is exactly the same as asking the
question to a dataset where all the holes have been
deleted.
The notable exception is the "size" method. This
returns the size of the set including the missing
elements. I can change this, but I don't think
that would be a good idea. Instead, I can add
a "count" method that will return the number of
non-missing elements.
Thoughts?
Furthermore, I plan to add a new access method for
the holed data sets:
template<class Set>
class HoleSet {
...
vector<Set::type const *> data() const;
}
that returns a vector of pointers to the original
type. The missing elements will have the value 0.
The pointer is pointing to a const element, so
this method is only for inspection.
I.e for a HoledRealSet, we would return a
vector<double *>.
This method will be linear time, or worse depending
on the effeciency of the push_back method in vectors.
I.e. we will construct the vector on demand.
Greets,
Asger
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]