Re: [guppi-list] Re: goose datatype system



> This gets into an issue that has concerned me.  Namely, which
> operations should be part of the container and which should stand
> alone?
>
> We just can't let DataSet (and the future DateSet, CategoricalSet,
> OrdinalSet, etc.) keep growing and growing.  If I've erred on the side
> of inclusion with the DataSet, perhaps we should start out by erring
> on the side of exclusion.

Agreed.

It also makes sense to try to make some of these routines generic.
As an example, take the "min" operation.  (Assume that we are not
interested in constant time.)
Then, there is no reason why we should implement it separately for
every container.  The obvious thing would be to use the STL min
algorithm.
Similarly, we should implement other common operations generically.
This can be done without stress-testing the compilers too much, because
we only abstract a type for a single function, and they are supposed
to be able to handle that.

> Now on the Polymorphism issue, I've changed my mind (slightly) on one
> issue.  I think that it would make sense for us to have all of our
> various containers derive from some base class... a GooseSet, if you
> will.  

Why do you want this?  To have better typing at compile time, or because 
you want to perform some generic operations on GooseSets, that require
them to come from the same base class?

Oh, you give the answer further below:

> In fact, the GooseSet interface would be so small as to be almost
> useless.  The only reason it would be around would be to allow for
> heterogeneous "containers of containers", things like
> vector<GooseSet*>.

Ok, I agree.

This implies that we will have a runtime penalty of an extra indirection.
This can be assumed to be constant time, so it's only a problem for methods
that are constant time.  And of these, it's only a real problem for those
that are used very often.

So the potentially relevant methods are the ordinary array-operations:  
look-up in particular.

> But the inheritance should be trivial, and the GooseSet should
> only offer a strictly minimalistic set of features.  Maybe even
> something as simple as:
>
> class GooseSet {
> public:
>   virtual ~GooseSet();
>   size_t size();
>   const string& label()
>   void set_label();
> };

I think we need the set of ordinary array-operations as well: inserts,
deletes, retrieval, etc.  Otherwise, things like the ascii import will
have a hard time to fill up such a structure.  If we do not have inspection
methods, things like an interactive container editor will have a hard time
to present the contents too.  We also need the add/remove methods for the
interactive editor, so all in all, we need a full set of array-operations.

Thus, I propose that we adopt an iterator-interface and add that to
the base GooseSet.  This will solve the problem for ascii import and
editing, and also make these things compatible with the STL algorithms.  
Besides being an excellent technical and ideological solution to the task 
of representing one-dimensional things, this concept is also familiar to 
most C++ programmers.  It makes the interface easier to understand and use.

There is a run-time overhead involved with iterators, but since we already 
impose an overhead on these access-methods from the base class, we can shift
the overhead from the container classes to the iterator-classes.
I.e. the difference is the same.

> (BTW, DateSet is just too typo-inducingly close to DataSet.  What else
> could we call it?)

Maybe we should rename the DataSet to RealSet?  
That would also allow us to call the GooseSet a DataSet instead, which
is more natural to me.
Furthermore, it would make the RealValue and RealType names in the 
Type/Value-system seem more natural.
(If you don't like the Real-prefix, we can use Double- if you prefer,
but I thought that DoubleType was a bit ambigious.)

> So between this and the Type/Value system, we'd have both
> container-level and element-level polymorphism... which should be
> enough for almost any purposes.

Yes.  I think this design would work.

> But all kidding aside, I really think that the "distributed" approach
> is the way to go.  Let's hammer out some code and see how it look
> to us then...

Ok.  I'll continue refining the Value and DataType classes, and try to
prepare for a StringSet instance.
Also, we need to flesh out the GooseSet/DataSet abstract class some more, 
if we choose to add the iterator interface to it.  (Basically a task of 
copy/paste from a STL header file.)

Greets,

Asger



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]