# Re: Accuracy of Statistical Functions

• From: Leonard Mada <discoleo gmx net>
• To: gnumeric-list gnome org
• Subject: Re: Accuracy of Statistical Functions
• Date: Sat, 11 Nov 2006 12:08:38 +0200

```Hi Morten,

indeed, calculating the sum is not an easy task on some data sets. :-)

Morten Welinder wrote:
```
```Sorting works great if and if only all the numbers have the same sign.
If not, I do not currently know what is best.  In fact, all the simple
```
algorithms we have thought up have been immediately met with a counter example.
```Notice, though, that in your example, the best order is the more or less
```
*reverse* sorted, i.e., start with the absolute biggest. Then the largest numbers cancel out nicely.
```
```
Well, the sorting algorithm works great for many situations (however as I pointed in the OOo thread, you must sort the absolute values, i.e. |Xi| to work for mixed data sets, too).
```
```
This method is more accurate in *most instances* than the usual addition, though, as you pointed out, in gnumeric my particular Test Case was more accurate using a different method. That was indeed a particular case (and by the way, it challenged OOo, and for that application, sorting was the best) and any algorithm implemented must work in *the general case*, not just for a partucular case.
```
```
Because more robust methods tend to be slow, would it be feasible to have a complementary set of functions, e.g.:
```- SUM() and SUMA(), a very accurate version of SUM(), and the like?
```
- in general such algorithms may be more suitable/ useful for other (statistical) functions (not just the sum)
```
```
IF that is the case, I could imagine some more accurate (but painstakingly slow) algorithms, like:
```- sort order ascending |Xi|, i = 1..N
```
- drop values of 0 => |Xi|, i = j..N, where j >= 1, depending on how many 0's were present
```- calculate round(log10(|Xi|)): values will be between -k..+m
- iterate through -k..+m
-- Yk = add first all Xi with the most significant digit in position (-k)
-- IF (|Yk| < |first (-(k-1)) Number| ) {
```
-- truncate the (-(k-1)) digit from Xi with (-(k-1)) most significant digit [though I do NOT know what the accuracy penalty would be for such an operation] [however, adding directly Yk to (-(k-1)) numbers may loose one digit from Yk] -- Yk-1 = add Yk + SUM of all less significant (k) decimal digits of the (-(k-1)) numbers
```  -- Yk-1 += all (k-1) decimal digits
-- }
```
-- else { Yk-1 = Yk + SUM(-(k-1) Numbers ) } // COULD SORT THESE NUMBERS AGAIN,
```      where Yk is added to the numbers list
-- repeat

```
Of course, there are some bottleneck cases even for this algorithm, BUT in general it will work better.
```
```

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]