notes on the recent Goose changes



I've checked in all of the code that switchs from size_t-indexing to
int-indexing of DataSet objects, and things are approaching a fairly
debugged state.  Assuming no big problems emerge, I'll make a release
in a few days.  (BTW, these changes will hopefully fix the build
problems that people on some 64-bit platforms have been having.)

Here is how int-indexing works, with the focus on the RealSet object.
(The generic parts of this also apply to arbitrary DataSet
derivatives).

RealSet objects are basically just fancy arrays.  They have a size(),
and an accessor function called data.  In the past, a RealSet of size
N has been indexed:
  data(0), data(1), ... , data(N-1).

Now, RealSets can have arbitrary integer indices.  In addition to a
size() (which is, and has always been, some sort of unsigned type),
there are now min_index() and max_index() functions that return
integers.  It is always true that max_index() - min_index() + 1 ==
size().

The correct way to iterate over a RealSet's data elements is now:

  for(int i=min_index(); i<=max_index(); ++i)  // <=, not < !!!
    foo(our_realset.data(i));

If you are interested in maximum performance, you can still access
the RealSet contents as a block of memory:

  const double* d = our_realset.data();
  for(size_t i=0; i<our_realset.size(); ++i)
    foo(d[i]);

The RealSet also allows access to a sorted version of the contents via
the sorted_data accessor.
>> >> >> Unlike data(), the sorted_data() is still indexed 0..N-1. << << << 
sorted_data(0) is the smallest element, sorted_data(1) is the next
smallest, etc. 
(Note: IMO this seems like the "right" thing to do, but it is also
potentially very confusing.  Maybe sorted_data() and data() should
share index bounds... what do you all think?)



The indexing of the RealSet can be changed as follows:

shift_offset(k) increases the min_index() and max_index() by k.
set_offset(k) sets the min_index() equal to k.

The *_offset() functions only change how the data is indexed --- not
the data itself.

There is also a static DataSet::set_default_offset(int) function that
will globally change the default offset of all newly-created DataSet
derivatives.  (It doesn't impact existing objects.)  So all of you
frustrated Fortran programmers can index all of your RealSet's from
1..N by default if you want to. :-)


To help find bugs related to the switch-over, I've added a lot of
bounds-checking (wrapped in #ifdef/#endif) to the RealSet class:
pretty much any out-of-bounds access should cause an exception to be
thrown.  While the RealSet itself is pretty-well debugged, other Goose
code (in particular, statistical code) might still be hard-wired to
iterate across data elements 0..N-1.  Hopefully the added bounds
checking will cause these errors to emerge quickly.  When debugging
code, try putting

  DataSet::set_default_offset(-12345);

up at the top of main().  If that doesn't cause any out-of-bounds
faults, your code is probably in good shape.

-JT



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]