Re: [guppi-list] Scientific Application, data format

From: Jon Trowbridge <trow emccta com>
To: guppi-list gnome org
Subject: Re: [guppi-list] Scientific Application, data format
Date: Mon, 21 Sep 1998 10:53:14 -0500
On Mon, Sep 21, 1998 at 02:32:02PM +0200, Dirk Luetjens wrote:
> The problem is, that different libraries, different approaches
> all use different structs to store their data in. But the most common
> is to use a pointer to a piece of memory and a size element. So let's
> build a data type around this BasicData element.
> 
> > typedef struct _BasicData BasicData;
> > struct _BasicData BasicData {
> > 	void	*data;
> >	size_t	size;
> >	Types	type;
> > }

This isn't quite enough, I think.  It is handy to assume that your
data isn't stored contiguously.  So you'd want to have BasicData be
something like:

struct _BasicData {
  void *data;
  size_t size;
  size_t stride;
  Types  type;    // I'll ignore this field from now on.
};

This lets you get one field of data out of a struct, as in:
struct pair {
  double x, y;
};

struct pair *pair_array = my_function();

BasicData b;
b.data = &(pair_array[0].x);
b.size = lenth_of_pair_array;
b.stride = sizeof(struct pair);

Now if you increment your indexing pointer by stride during your
iteration, you can take a "slice" of your struct.

This is (essentially) how I handle data for my Gtk-- plotting widgets,
creating a struct that just holds a double*, a size, and a stride.

> Now let me compute e.g. a histogram over this data and store the
> result again in such an BasicData struct. The problem now is, that I
> loose all information about thje min, max value in the data set, the
> number and width of the bins, about attributes (like
> APPEND_OUT_OF_BOUND_BINS, EQUALIZE_TO_ONE, ...). What to do with this
> extra data. 
> 
> The first solution would be a subclass of the BasicData type with the
> appropriate fields. But this turns out to be not very helpful, since
> you can't do multiple subclassing in C, and how would you say that you 
> would like to build a histogram over an n-dim data structure, where
> the n-dim data structure adds new extra data to the BasicData
> struct.

I would consider this to be a good argument for using C++.  If
subclassing is the "right" solution to the problem, we should use
subclassing and a language that supports subclassing.  But I like C++
quite a bit, and I know that lots of other people don't.  So YMMV.

> With this you can always pass Data structs around. If the called
> functions needs more information about the Data it can look up this
> information in the list and store a pointer by side. The function can
> although generate new extra_data. Extra Data can although be another
> BasicData structure, an image mask for example.
> 
> Missing: 
> - How to load and to store this data ind know data formats?
> - How to handle attributes? (Perhaps use a get_arg, set_arg subsystem)
> - With this you pass pure data structures around. But, what if I want
>   to pass a mathematical function around? For this the Data interface
>   must be totally abstracted, like data_get_value_at(),
>   data_get_range(), ...
This is a big one.  You might need to handle functions with various
arguments, structures, etc.  If you are using this approach and C, get
ready to write lots of code full of ugly pointer casts.

Also, what if the data isn't there?  With this method, you can't do
"lazy calculation" of only values that you need.  So I predict a lot
of code that looks like this:

      foo = load_foo_from_data_file("foo.data");
      struct eigen* e = (struct eigen*)data_entry_get(foo, "eigensystem");
      if (e == NULL) {
        e = calculate_foo_eigensystem(foo);
	entry = data_entry_new(EIGEN_ID, "eigensystem", TYPE_FOO, foo);
        data_add(foo, entry);
      }

This puts a *huge* burden on the application developer, both to get
all of the casts right, and then to check every return value and
correctly calculate & cache the value/struct/whatever if it isn't
there already.

But you have to do this, since:
  * In C, you can't really get this stuff calculated on-the-fly with
    the scheme you describe.
  * Having every data element pre-calculate everything probably would
    be very inefficient.

-JT
References:
- Scientific Application, data format
  - From: Dirk Luetjens
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]