# Re: Median: Oasis and Fast Sorting Algorithm

• From: "Uri David Akavia" <uridavid akavia gmail com>
• To: "Leonard Mada" <discoleo gmx net>
• Cc: dev sc openoffice org, gnumeric-list gnome org
• Subject: Re: Median: Oasis and Fast Sorting Algorithm
• Date: Sat, 10 Feb 2007 11:48:41 +0200

```Admittedly, my complexity course is behind me, but there seem to be a
few things wrong with your calculations of the efficiency. I'll try to
point out what I didn't like. If I'm wrong, please correct me.

On 2/10/07, Leonard Mada <discoleo gmx net> wrote:

```
```I have imagined a different algorithm, too, although I never tested it.
I will briefly describe the case with 2*n elements (2n+1 should be similar):
```
```First, if you're using 2*n elements, shouldn't your results be nlog(n)
Second, if you're going to use O notation, O already indicates the
worst scenario, and basically O(n/2 * log(n) -1 + log[(n/2 -1)!]) can
be reduced to O(nlog(n)), which is the same O as ordinary sort. It
might have better Omega or Theta, which I think is what you meant.
Third, you seem to have forgotten some actions when calculating - I
realize that you're allowed to ignore some operations, but I'm not
sure about these (again, please correct me if I'm wrong) - there are
two comparisons for every item in the second half of the array,
therefore you need to add 2*n actions.
In step 7, you mention that adding an additional value should take
O(log(n)). I believe that figuring out where to place this value would
take O(log(n)). If in addition you need to actually place this value
in the array, you need to shift n/2 items in the worst case scenario.
This means that you have a decreasing series, going from n/2 to 1 -
the sum of this series is (n/2 + 1)*n/4 or O(n^2).
According to my calculations, this means that figuring the median of
2*n items would take n*log(n) + log(n!) + (n/2 + 1)*n/4, or O(n^2).
This would seem to be slower than your calculation, and has a larger O
notation then ordinary sort. In fact, it would seem to be effectively
slower then 2n*log(2n).

As I said, I might be mistaken in my calculations, since it has been a
while since I actually learned how to do them. If I am, please correct
me.

However, O notation is not a practical representation, since it
assumes that all actions are equal. I'm not familiar enough with what
actions are costlier in C, and you might have to take into account
Gnumeric's internal representation. As one coding book mentioned
"Pillage, then burn" - first measure actions and figure out what to
optimize, then do it. You'd have to benchmark the current C code, and
your proposed C code to figure out the difference.

So that's one point.

While I'm not sure about revising the algorithm, you might be right
about the definition - define it as "MEDIAN will return the value that
WOULD be in the middle IF sorted" and then you won't force anyone to
sort. I still think most people will, since median discovery by
sorting is fast enough for most practical purposes, and it is much
simpler (hence, but free) then more complicated code.

Yours,

Uri David Akavia

```