Re: [guppi-list] Re: Goose Design Problem & a Proposed Solution (long)



> The main problem that I'm having right now is that the current scheme
> doesn't generalize in a clean way to handle categorical data.

I don't understand this.  I guess I have to see the code to 
understand the problem.

> Yes, but we still need run-time checks when you use the Value
> framework.  To pick an example, the RealSet will (more or less) have
> two ways to insert values: insert(double) and insert(string).
> The first gives a compile-time check.  For the second, you might get
> a string containing garbage, and then you throw an exception --- so
> this is a run-time check.
>  
> But RealSet already has two ways to insert: insert(double) and
> insert(Value).  If you pass a String Value into insert, or some other
> invalid type, you throw an exception.  This is also a run-time check.

The (big) problem is not that we have a check at run-time.  The big 
problem is that we do not have a check at compile time.  We loose type
checking.
If I pass a C++ string to the RealSet as it is now, I'll get a
compile error.  If I pass a char * I get a type error as well.
If we change the interface to a string, we loose this type checking.

This is what I'm talking about when I say that we should at least
do what we can.

> The reason types were ever introduced originally in Goose was to
> provide a convenient mechanism for people writing data browsers, GUIs,
> etc. so that they could move string reps of data in and out of a
> DataSet without having to worry to much about what the strings
> contained.  By going to strings, we go back to solving this problem
> directly.  With Values, I feel like we are adding complexity to solve
> some other, more general, problem... a problem that I'm not sure we
> really need to solve.

If we move strings into the DataSets, we give them the responsibility
of parsing the strings.  One goal of the Value/DataType system is to
get as much out of the DataSets as possible.  All they have to do now
is to use the Accessor to get the value out as they want it.

I still do not understand what complexity you are trying to get rid of.
Yes, we have an extra layer, but at some point, we have to convert
the strings to values, and back.  The question is where to put this
layer.  We can't get rid of the facilities present in the DataType 
class hierarchy.

As I understand your proposal, you want to get rid of the Value's
in the DataSet interface, and replace them with string's.  This
implies that the DataSets dispatch the task of converting
the strings to the DataType hierarchy, which is thus similar
to the use of the Accessor to get the value out. But I
think this would be a mistake because of the loss in type 
checking.

Anyway, you might have some good reasons that appear when you design
the categorical data.  I haven't done that, so I don't know what they
are.

So instead of arguing back and forth on a loose ground, I suggest that 
you just go ahead and do things as you like them.  But please, don't 
remove the Value/DataType hierarchy just yet.

Later, I'll have a look at it, and if I feel that I need some more
compile time type checking, I can add a bridging class that provides
this.

> > I'm not sure you solve the problem of switch-statements:  You
> > still need to handle the case where you pass a string that
> > don't parse as a real to a RealSet, and this handling is dependent
> > on the DataSet you are dealing with.
>  
> The DataSet can throw an "invalid string format" exception.  With
> values, you'd have to throw an "invalid value" exception.

I don't understand what this difference implies?

> Let me say a few more words on why Values bug me.  In categorical
> data, we need to map strings to code numbers and back.  So a
> "categorical value" is basically an integer, which is associated to a
> string.  But for every different category, we need some sort of
> "categorical value factory", and we end up with the Value and the
> ValueFactory coupled in some way, because instances of the Value class
> need to be able to do the int <-> string conversions.  And this
> coupling is particularly vexing because I don't feel like we are
> gaining a lot in exchange for this complexity...

If you have a closer look at the Value/DataType system, you'll see
that yes, the Values can convert strings to Values, but this is done
by dispatching the job to the corresponding DataType instance.
If you don't like this, we can get rid of the string constructor in
the Value class. (I considered doing this myself.)

Returning to your task, the DataType instance serves as your ValueFactory
as I understand it.  I.e. the complexity is not increased compared
to Strings, Reals and other types (remember that a Real is not unique
in an internationalized setting - it needs a pointer to a RealType
in order to be able to convert itself to a string.)

So the CategoricalDataType contains a map<string, int>, and the
Value has a pointer to a CategoricalDataType, just like the Real's
has a pointer to a RealType.

As I see it, we need to hold this map somewhere, and to me it's logical
to put it in the DataType, because then a DataType really defines the
type of a Categorical.

> Good.  I'm hoping that the new scheme could (at least potentially)
> make import-type code simpler.  We'll see...

The import code IS relatively simple IMO.  However, do notice, that 
the code in the cvs is not synched with my source.  Last year, I worked 
a bit more on the design of the Value/DataType system, but since then, 
it has rusted and does not apply cleanly.

Yes, the Value/DataType design might seen complex at first.  
However, I don't think it is that bad.  I explained all of it to my
co-worker in half an hour, and he seemed to like it.
If you have the book "Design Patterns", you'll recognize some
of the constructs.  Having that book will help understand the
reasoning behind the design, and the seeming complexity. Most
of the complexity is design recommendations from that book,
and the goal is to have as secure a system as possible (from the
viewpoint of the programmer.)

And once more, the code is still evolving.  I didn't want to spend
too much time to document it until the design had settled a bit.

-

Don't get we wrong:  I am perfectly willing to scrap the
system as it is, but I'd like to understand why first.  If I
don't understand why, I won't be able to avoid the problem
the next time ;-)

> > However, in our project,
> > we have bumped into a serious problem:  Making a DLL out of Goose
> > in Windows is difficult, because the Visual C++ compiler is not
> > able to handle the static data that the STL needs...
> > <description of stupid broken compiler behavior deleted>
>  
> Ugh.  Well, we'll figure this all out one way or another...
>
> I thought I heard something about RMS working on a new version of the
> (L)GPL, which addresses issues like templates.  I could be
> misremembering wildly, though...

Ok, that might be the solution if that arrives before we ship our product.
But I hope you understand that we need to have some kind of guarantee
that we will not burn ourselves by playing on this horse.  It's simply
too risky to continue with Goose as the fundament and then find out that
we can't use it after all.

If you and Havoc can send me an e-mail where you allow us to link
Goose statically in Visual C++ applications until it is possible to do
it without tagging the source of Goose, I think that would be all
we need.

Greets,

Asger



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]