# Re: Accuracy of Statistical Functions

*From*: Leonard Mada <discoleo gmx net>
*To*: gnumeric-list gnome org
*Subject*: Re: Accuracy of Statistical Functions
*Date*: Sat, 11 Nov 2006 12:08:38 +0200

Hi Morten,
indeed, calculating the sum is not an easy task on some data sets. :-)
Morten Welinder wrote:

Sorting works great if and if only all the numbers have the same sign.
If not, I do not currently know what is best. In fact, all the simple

`algorithms we have thought up have been immediately met with a counter
``example.
`Notice, though, that in your example, the best order is the more or less

`*reverse* sorted, i.e., start with the absolute biggest. Then the
``largest
``numbers cancel out nicely.
`

`Well, the sorting algorithm works great for many situations (however as
``I pointed in the OOo thread, you must sort the absolute values, i.e.
``|Xi| to work for mixed data sets, too).
`

`This method is more accurate in *most instances* than the usual
``addition, though, as you pointed out, in gnumeric my particular Test
``Case was more accurate using a different method. That was indeed a
``particular case (and by the way, it challenged OOo, and for that
``application, sorting was the best) and any algorithm implemented must
``work in *the general case*, not just for a partucular case.
`

`Because more robust methods tend to be slow, would it be feasible to
``have a complementary set of functions, e.g.:
`- SUM() and SUMA(), a very accurate version of SUM(), and the like?

`- in general such algorithms may be more suitable/ useful for other
``(statistical) functions (not just the sum)
`

`IF that is the case, I could imagine some more accurate (but
``painstakingly slow) algorithms, like:
`- sort order ascending |Xi|, i = 1..N

`- drop values of 0 => |Xi|, i = j..N, where j >= 1, depending on how
``many 0's were present
`- calculate round(log10(|Xi|)): values will be between -k..+m
- iterate through -k..+m
-- Yk = add first all Xi with the most significant digit in position (-k)
-- IF (|Yk| < |first (-(k-1)) Number| ) {

` -- truncate the (-(k-1)) digit from Xi with (-(k-1)) most significant
``digit
`` [though I do NOT know what the accuracy penalty would be for such
``an operation]
`` [however, adding directly Yk to (-(k-1)) numbers may loose one
``digit from Yk]
`` -- Yk-1 = add Yk + SUM of all less significant (k) decimal digits of
``the (-(k-1)) numbers
` -- Yk-1 += all (k-1) decimal digits
-- }

` -- else { Yk-1 = Yk + SUM(-(k-1) Numbers ) } // COULD SORT THESE
``NUMBERS AGAIN,
` where Yk is added to the numbers list
-- repeat

`Of course, there are some bottleneck cases even for this algorithm, BUT
``in general it will work better.
`

[

Date Prev][

Date Next] [

Thread Prev][

Thread Next]
[

Thread Index]
[

Date Index]
[

Author Index]