Re: data analysis bug
- From: "Andreas J. Guelzow" <aguelzow math concordia ab ca>
- To: gnumeric-list <gnumeric-list gnome org>
- Subject: Re: data analysis bug
- Date: Thu, 12 Jul 2001 23:39:15 -0600
Andreas J. Guelzow wrote:
I guess I should really file an official bug report but here is the
short form:
The pooled variance calculated for the "T-test: Two Sample Assuming
Equal Variances" is not always correct:
sample 1: 1,2,3
sample 2: 1, 2.2, 3.4
individual variances are 1 and 1.44. The pooled variance is a weighted
average of those two, ie. must be lying between 1 and 1.44 (in fact it
is 1.22), but gnumeric calculates 0.988
This will normally also affect the t statistic etc. I didn't check
whether they were incorrect but since they should use the pooled
variance....
This is today's CVS version of gnumeric. (But I observed the same
problem in an earlier version while using gnumeric in an introductory
statistics class this May.)
I didn't think I could find this in the code that fast (but the code
very nicely written). In analysis-tools.c lines 1436ff
are:
var = (set_one.sqrsum + set_two.sqrsum - (set_one.sum + set_two.sum) *
(set_one.sum + set_two.sum)/ (set_one.n + set_two.n)) /
(set_one.n + set_two.n - 1); /* TODO: Correct??? */
this calculation is incorrect but should be:
var = ((set_one.sqrsum - set_one.sum2 / set_one.n)+(set_two.sqrsum -
set_two.sum2 / set_two.n))/
(set_one.n + set_two.n - 2);
Interestingly enough in the following calculation for the t value this
pooled variance isn't even used but the t-value under the assumption of
unequal variances is calculated:
t = fabs (mean1 - mean2 - mean_diff) /
sqrt (var1 / set_one.n + var2 / set_two.n);
should really be:
t = fabs (mean1 - mean2 - mean_diff) /
sqrt (var / set_one.n + var / set_two.n);
Andreas
--
Prof. Dr. Andreas J. Guelzow
Assoc. Prof of Mathematics
Concordia University College of Alberta
http://www.math.concordia.ab.ca/aguelzow
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]