goose datatype system



I'm confused about what direction the Value/DataType stuff is going
in... probably because I have a murky idea of how it is should be
done, and I'm no doubt not in sync with your much clearer idea.

Here is a sketch of my murky idea, and maybe you (Asger) can tell me
how it relates to your work.

--------------------------------------------------

My thinking is that the DataSet should be the "Universal Container"
for statistical data.  Different types of data include:

(1) Numerical data 
(2) Time/Date data
(3) Categorical Data (strings)
(4) Strange numerical data (i.e. circular data)

Now clearly case (1) is covered by just the DataSet.  The question is,
how do we handle (2) and (3).

My proposed solution is that we always convert everything to a double
and store it in the DataSet.  The DataType class would be the vehicle
for converting elements from human readable format to doubles and back
again.

For example, (2): Dates and times are often stored internally as the
number of days since a certain day, number of seconds since a certain
time, etc.  We would just store these integer quantities in doubles
instead of integers (which is maybe a bit wasteful, but otherwise
fine).  The DataSet class would handle all of the conversions from,
for example, MM/DD/YYYY format to a number, and back again.  Thus a
plotting program that was plotting X vs. Y, where X and Y are two
DataSets, would pass the data through some DataType functions before
using it to label axes, identify the coordinates of specific points,
etc.

This would also work for (3): For statistical purposes, we don't need
a container containing an arbitrary list of strings.  Instead, we have
N categories, identified by strings, that we want to use to label
other data points.  So our categorical DataType would maintain a
mapping of category names to numbers, so that it could do the
conversions. (i.e. "New York" => 1, "Paris" => 2, "Moscow" => 3, ...)

For case (4), the DataType would also handle the appropriate
transformations, as we would (probably in the case of circular data)
want to store numerical values mod 2Pi.

--------------------------------------------------

Well, those are my random musings for now.  How does that fit in with
your (Asger's) ideas?  Or, for that matter, does anyone else on the
list have comments?

-Jon











[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]