Re: Accuracy of Statistical Functions



Hi Morten,

indeed, calculating the sum is not an easy task on some data sets. :-)

Morten Welinder wrote:
Sorting works great if and if only all the numbers have the same sign.
If not, I do not currently know what is best.  In fact, all the simple
algorithms we have thought up have been immediately met with a counter example.
Notice, though, that in your example, the best order is the more or less
*reverse* sorted, i.e., start with the absolute biggest. Then the largest numbers cancel out nicely.

Well, the sorting algorithm works great for many situations (however as I pointed in the OOo thread, you must sort the absolute values, i.e. |Xi| to work for mixed data sets, too).

This method is more accurate in *most instances* than the usual addition, though, as you pointed out, in gnumeric my particular Test Case was more accurate using a different method. That was indeed a particular case (and by the way, it challenged OOo, and for that application, sorting was the best) and any algorithm implemented must work in *the general case*, not just for a partucular case.

Because more robust methods tend to be slow, would it be feasible to have a complementary set of functions, e.g.:
- SUM() and SUMA(), a very accurate version of SUM(), and the like?
- in general such algorithms may be more suitable/ useful for other (statistical) functions (not just the sum)

IF that is the case, I could imagine some more accurate (but painstakingly slow) algorithms, like:
- sort order ascending |Xi|, i = 1..N
- drop values of 0 => |Xi|, i = j..N, where j >= 1, depending on how many 0's were present
- calculate round(log10(|Xi|)): values will be between -k..+m
- iterate through -k..+m
  -- Yk = add first all Xi with the most significant digit in position (-k)
  -- IF (|Yk| < |first (-(k-1)) Number| ) {
-- truncate the (-(k-1)) digit from Xi with (-(k-1)) most significant digit [though I do NOT know what the accuracy penalty would be for such an operation] [however, adding directly Yk to (-(k-1)) numbers may loose one digit from Yk] -- Yk-1 = add Yk + SUM of all less significant (k) decimal digits of the (-(k-1)) numbers
  -- Yk-1 += all (k-1) decimal digits
  -- }
-- else { Yk-1 = Yk + SUM(-(k-1) Numbers ) } // COULD SORT THESE NUMBERS AGAIN,
      where Yk is added to the numbers list
  -- repeat

Of course, there are some bottleneck cases even for this algorithm, BUT in general it will work better.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]