Re: RFC -- improved Fourier transform

From: John Denker <jsd av8n com>
To: Gnumeric Spreadsheet List <gnumeric-list gnome org>
Subject: Re: RFC -- improved Fourier transform
Date: Tue, 02 Apr 2013 14:03:20 -0700

On 04/02/2013 01:16 AM, jcupitt gmail com wrote:

I've been using fftw in my code with success:

http://www.fftw.org/

It's a GPL fft library. It's very quick, very accurate and (relevant
here) supports arbitrary-sized transforms.


Agreed.  It would be hard to do better than fftw.

It is easily available on all platforms I can think of at the moment.
In most cases it's as easy as "apt-get install fftw-dev".

Question:  Are there any supported platforms for which depending on
fftw would be a problem for gnumeric?

This might be an easy way
to remove the power-of-two issue. It also supports real->complex and
complex->real half transforms, which might be interesting. You'd need
to add extra options for that, of course.


Hmmmmm.  Can't all those things already be done using the imfourier(...) 
routine?
 --Mixing real and complex on the input is easy;  Just now I clarified 
  the FUNC_HELP message so that it reads as follows:
      In the one-column case, real and complex numbers may be used in 
      any combination.  Complex numbers, if any, are represented by 
      strings of the form ="1+2i". Real numbers may be represented 
      by plain old numbers, or by strings that look like numbers.

 -- Throwing away the imaginary part of the output is also easy.

 -- Keeping only the magnitude of the output is also easy.

 -- Keeping only the low-frequency half of the output is also easy.

If there is something else that would be "interesting", please clarify.

================================

On 04/01/2013 03:28 PM, Andreas Guelzow wrote in part:

An obvious use case for skipping invalid data is to allow whole columns
to be specified even if the first cell in each column contains a label
and the cells at the end of the column may be empty.


OK, thanks, that's a nice informative answer to my question.  Headers are 
a good thing, and should be encouraged.

I'm not sure what is the Right Thing(tm) to do here.  Let me share some 
partially-baked thoughts:

The fine art of header-skipping requires some skill.  The following 
questions are answerable, but it I reckon different people might 
answer them differently:
 -- What if there are numbers in the header (as there often are)?
 -- What if there are 3 blank lines between the header and the body of data?
 -- What if there are blank lines /after/ the body of data?
 -- What if there is a blank line in the middle of the data?
 -- What if there is a blank cell in an otherwise non-blank row of data?

Let's keep in mind the context:  This is part of the "time series analysis"
plugin.  The semantics of time-series data is rather different from data
that is in principle unordered, such as the argument to sum(...) or even
average(...).  In particular, the FFT is sensitive to the number of rows,
even more sensitive than average(...) is.

Also note that XL already exhibits a variety of different behaviors:
  average(1*C14:C17)           [1]
is different from
  average(C14:C17)             [2]
is different from
  linest(C14:C17)              [3]

AFAICT:
 [1] (the "arithmetic" rule) says anything that can be coerced will be coerced
   into a number, while anything that can't be coerced throws an error;
 [2] says anything that is not strictly a type=1 number gets skipped; and
 [3] says anything that is not strictly a type=1 number throws an error.

Since XL does not seem to have a fourier(...) routine, it is hard to argue
that we should do the "same" thing as XL.  It is also hard to argue that we
should follow the "pattern", since XL does not itself follow any pattern
that I can make out.

As a specific example:  Even though headers have obvious advantages, somebody 
could argue by analogy to linest(...) that headers should not be allowed.
Also, somebody could argue that b5:b9999 is "almost" as user-friendly as
b:b, and provides a reasonable way to skip four rows of headers.

More importantly, if there are 459 lines of real data, followed by endlessly
many empty lines, one could argue in the context of a Fourier transform that 
b1:b600 is significantly different from b1:b1024, and b:b is not a reasonable 
substitute for either.

Again:  Worrying about headers unleashes a big can of worms.  One could 
imagine writing a  decapitate(...)  function that chops off headers -- but 
I'm not sure that's the Right Thing, either.

As it stands now, the imfourier(...) code plays by rule [1], the "arithmetic"
rule.  I'm not saying this is optimal, but it's not completely crazy, either.

I've been doing a lot of useful stuff with imfourier(...) over the last few 
days.  Good fun.

Follow-Ups:
- Re: RFC -- improved Fourier transform
  - From: jcupitt

References:
- RFC -- improved Fourier transform
  - From: John Denker
- Re: RFC -- improved Fourier transform
  - From: jcupitt

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]