Scientific Application, data format



Hi,

I havnīt stop thinking about this project :-), even if it was quite
calm from my side in the last week. I thought about the data format to 
use.

The problem is, that different libraries, different approaches
all use different structs to store their data in. But the most common
is to use a pointer to a piece of memory and a size element. So letīs
build a data type around this BasicData element.

> typedef struct _BasicData BasicData;
> struct _BasicData BasicData {
> 	void	*data;
>	size_t	size;
>	Types	type;
> }

To not loose the type information I put a type element in there. This
is definitifly the most simple encapsulation of data. You can do
whatever you want, you can cast this to the appropriate type and do
fast data lookups. 

Now let me compute e.g. a histogram over this data and store the
result again in such an BasicData struct. The problem now is, that I
loose all information about thje min, max value in the data set, the
number and width of the bins, about attributes (like
APPEND_OUT_OF_BOUND_BINS, EQUALIZE_TO_ONE, ...). What to do with this
extra data. 

The first solution would be a subclass of the BasicData type with the
appropriate fields. But this turns out to be not very helpful, since
you canīt do multiple subclassing in C, and how would you say that you 
would like to build a histogram over an n-dim data structure, where
the n-dim data structure adds new extra data to the BasicData struct.

So what, if we store the extra data in a list associated with the
BasicData and generate this list on the fly during runtime. This adds
the overhead of looking up the data in this extra list, but with a
good underlaying system this overhead can be reduced to nearly
zero. This could look like:

> typedef struct _Data Data;
> struct _Data {
> 	Data	*data;
>	List	+extra_data;
> }
>
> typedef struct _DataEntry DataEntry;
> struct _DataEntry {
>	int	id;
>	char	*name;
>	Type	type;
>	pointer	data;
> }

and for the histogram:

> typedef struct _histogram histogram;
> struct _histogram {
>	double	min;
>	int	num_bins;
>	double	bin_width;
> }	

and you would use it, like

> Data* histogram (Data* data)
> {
>	... compute histogram over data in h[] ...
> 	entry =	data_entry_new (HISTOGRAM_ID, "histogram", TYPE_HISTOGRAM, h);
> 	data_add (data, entry);
> }

With this you can always pass Data structs around. If the called
functions needs more information about the Data it can look up this
information in the list and store a pointer by side. The function can
although generate new extra_data. Extra Data can although be another
BasicData structure, an image mask for example.

Missing: 
- How to load and to store this data ind know data formats?
- How to handle attributes? (Perhaps use a get_arg, set_arg subsystem)
- With this you pass pure data structures around. But, what if I want
  to pass a mathematical function around? For this the Data interface
  must be totally abstracted, like data_get_value_at(),
  data_get_range(), ...
- Is this IDL aware? I definitfly want to use CORBA as a builing
  block. But I havnīt worked with this, yet. 

What do you experts think?

Dirk



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]