On Sat, 2011-04-09 at 11:34 +1000, Steven D'Aprano wrote:

I'm trying to find out information about the SSMEDIAN statistical 
function in Gnumeric, and how it differs from MEDIAN.

I have read the description of the function:

"median for grouped data as commonly determined in the social sciences"

and spent much time googling for more information, but for something 
described as "commonly", I haven't been able to find any information 
about it. Neither Wikipedia nor Mathworld has any reference that I can 
find to the social sciences using a different algorithm for finding the 
median of grouped data.

The example given in gnumeric's function wizard shows ssmedium(7, 8, 8) 
with the default interval of 1 as returning 7.75. How should I interpret 
this? This is my guess:

If the interval is 1, then the data point 7 actually means some value 
between 6.5 and 7.5;
Likewise the data point 8 means 7.5 through 8.5.

So we have grouped data:

Group        Frequency
6.5 - 7.5    1
7.5 - 8.5    2

If I apply the formula for grouped median found here:

median = L + i*(n/2 - CF)/f


L = the lower limit of the class containing the median
i = the width of the class containing the median
n = the total number of frequencies
CF = the cumulative number of frequencies in the classes preceding the 
class containing the median
f = the frequency of the median class

I get:

median = 7.5 + 1*(3/2 - 1)/2 = 7.75

which matches. But of course, this could just be a coincidence. Can 
somebody please:

* confirm that the formula above is that used by SSMEDIAN?

* if not, what does SSMEDIAN actually do?

* point me at a more authoritative source for the formula given?

Since Gnumeric is open source, the easiest way of checking this is to
check the source. 

Looking at the source an obvious typo is visible but that typo will onl
affect the efficieny of the calculation for an even number of
observations with the two central values not equal but it will not
affect the result. 

SSMEDIAN is supposed to calculate the median as described in "Using
Basic Statistics in the Social Sciences" by Annabel Ness Evans.

This is in fact the formula you gave above:
 median = L + i*(n/2 - CF)/f
with the meaning of the variables as described above.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]