Re: Median: Oasis and Fast Sorting Algorithm
- From: Leonard Mada <discoleo gmx net>
- To: gnumeric-list gnome org
- Subject: Re: Median: Oasis and Fast Sorting Algorithm
- Date: Sat, 10 Feb 2007 19:35:03 +0200
Lets assume the following space A={1,2,3} [I do not know if it is called
space in English, probably NOT.]
Lets take the array 1,1,2,3,3,3
So the median is (2+3)/2 = 2.5 ? BUT the space A does NOT have the value
2.5. It is either 2 or 3. In this case, it is more appropriate to talk
about 2 medians, 2 and 3! And this list DOES NOT have the middle value
of 2.5 because this value DOES NOT exist.
Weighted Medians: actually, most real world applications making use of
median from signal processing, use the weighted median. (Just google for
"weighted median".) So again, because Excel did take the easy ride, I do
NOT believe that a standard should follow the same way.
Sincerely Yours,
Leonard Mada
Andreas J. Guelzow wrote:
On Sat, 2007-10-02 at 02:41 +0200, Leonard Mada wrote:
John Machin wrote:
...
So who cares? The median value is 1. Is your alternative going to
return some value other than 1 ????
Please define mathematically the middle value! It is NOT trivial as my
definitions showed. Anything else would be ambiguous. This should be a
standard, so make a better definition.
Contrary to your claims, there is nothing ambiguous. Any non-decreasing
list of the same values has the same middle value(s).
Well, I could have used a much shorter definition: the median is the
value that halves the list so that there are two sets of equal size with
numbers in the first set being higher than the median and numbers in the
second set being lower. As noted, this definition avoids the sorting,
too. (One could extend this definition for even and odd number of
elements. Or even a much shorter definition: the 50th percentile. BUT
all these definitions are ambiguous, see later.)
The one thing that I do NOT agree at all with the OASIS definition is,
that it includes the wording "sorting". Sorting is definitely NOT
necessary to calculate the median. You can take any array, even one that
is NOT sorted, and determine the median without first sorting it. This
is much to often stated wrongly in so many textbooks, BUT sorting is
really not necessary.
The OpenFormula standard does not prescribe any method used to find the
value. It only prescribes what the value is.
So, this is NOT a prerequisite that should enter a standard definition.
May I even point out, that for even number of elements, one may
define/have an upper median and a lower median. Alternatively, in
serious mathematical uses, the median is usually calculated using a
weighted approach. Therefore, the median of 1,2,2,3,4,5 is NOT (2+3)/2 =
2.5, BUT rather (2+2+3)/3 = 2.66. So, it does make sense to have a very
strong and unambiguous definition in a standard.
The *weighted median* may be introduced later into the standard and then
the ambiguity would be complete.
MEDIAN is not intended to implement a weighted median. None of the
current spreadsheet implementation uses that name for a weighted
median.
Gnumeric for example does also provide a function for a weighted median,
namely SSMEDIAN. That function may at some time also be introduced in
the Standard but would in no way make other definition ambiguous.
Andreas
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]