Re: [guppi-list] Re: Patches for Goose



Mistakenly didn't send this to the whole list...



On Wed, Mar 31, 1999 at 04:02:27PM -0500, Bradford Hovinen wrote:
> On Wed, 31 Mar 1999, Jon Trowbridge wrote:
> > Yeah, I still haven't decided how to best represent categorical data.
> > What is in there now is a (fairly broken) early rough draft.  The
> > question of how to do this right needs to be addressed eventually.
> 
> Based on what I've studied in the realm of chi square tests, the test
> itself should have support for general categories, N-way tables, and
> discrete data (or continuous data divided into intervals) that can be
> modelled with some kind of function, either continuous or discrete. Other
> uses of categorical data are mostly descriptive, dealing with segmented
> bar graphs and so forth, so that's pretty easy to model with a simple
> associative structure like the C++ map object.

Not really.  There is a whole universe of methods for categorical
data, although this is probably not very widely known as it is
discussed only tangentially in most first courses in statistics.
While the chi square test is of course an important categorical
method, it is only one of many things that we will ultimately need to
support... so having the CategorySet and chi square testing coupled in
some way would be a Bad Idea(tm).



> > Good.  I've been focusing on confidence interval methods so, to avoid
> > code duplication, we might want to define the hypothesis tests in
> > terms of the (more general) confidence intervals.
> 
> Ok, provided that the confidence level is properly adjusted for 1-tailed
> tests.

We'll just have to make sure to put in one-sided confidence intervals.



> > Now this raises an interesting philosophical issue: what do you do if
> > someone runs a test on data that doesn't match the underlying
> > assumptions of the test?  <snip>
> 
> I can understand your logic here. Perhaps a higher-level entity could
> produce a warning when that occurs, but still allow the test to
> proceed.

Since this is a library, I don't want warnings spewing to stderr right
and left.  Maybe a better thing to do would be to have a distinct
function that checks if the data appears valid.

So you could have:

double perform_test_foo(...); // returns p-value

bool check_assumptions_for_test_foo(...); // true if data looks O.K.

(minus the heinous function names, of course).  Then everyone has the
option of being careful if they want to.


-JT




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]