Re: Gsl FFT <-> FFTW compatibility



On Tue, Jul 26, 2011 at 03:31:04AM +0200, Tim Janik wrote:
> On Mon, 25 Jul 2011, Stefan Westerfeld wrote:
> 
> >On Mon, May 16, 2011 at 11:35:32PM +0200, timj gnu org wrote:
> 
> >However, there are the real variants, which used to work like that
> >
> >fftar: real data -> fftac -> post processing -> complex output
> >fftsr: complex data -> pre processing -> fftsc -> real output
> >
> >I changed these to FFTW behaviour by adjusting the (pre/post) processing steps,
> >so we have
> >
> >fftar: real data -> old fftac -> post processing* -> complex output
> >fftsr: complex data -> pre processing* -> old fftsc -> real output
> >
> >which is
> >
> >fftar: real data -> fftsc -> post processing* -> complex output
> >fftsr: complex data -> pre processing* -> fftac -> real output
> >
> >It would be possible to rederive the post/pre processing from numerical
> >recipies to do the right thing without still using the old style functions.
> >However that would be quite a bit of work, so I preferred to do it my way
> >which relies on the "old fftac" step and the "old fftsc" step.
> 
> I see, thanks for the explanation. It seems that you already adapted the
> post/pre processing, and given that the real valued fft routines don't
> match the complex valued one, I'd guess that you added additional ops in
> one or both of the post/pre processing routines which likely make things
> slower, right?
>

Well, so then  lets look at the changes in pre-/postprocessing in the inner
loop:

****** Analysis:

diff --git a/bse/gsl-fftconf.sh b/bse/gsl-fftconf.sh
...
@@ -309,12 +349,12 @@ gsl_power2_fftar (const unsigned int n_values,
       H1im = F2im + F1re;
       H1re = F1im - F2re;
       H2re = F2re - F1im;
-      H2im = H1im - FEim;
+      H2im = FEim - H1im;
       H1re += FEre;
       H2re += FEre;
       H1im += FEim;
       rivalues_out[i] = H1re;
-      rivalues_out[i + 1] = H1im;
+      rivalues_out[i + 1] = -H1im;
       rivalues_out[r] = H2re;
       rivalues_out[r + 1] = H2im;
       WMULTIPLY (Wre, Wim, Dre, Dim);
...

The first change should not affect performance, the second change is one
additional negation per complex output.

****** Synthesis:

@@ -360,9 +403,8 @@ gsl_power2_fftsr (const unsigned int n_values,
       ri |= j;
 
       FOre = -FOre;
-      FOim = -FOim;
       FEre *= 0.5;
-      FEim *= 0.5;
+      FEim *= -0.5;
       F2re = FOre * Wim;
       F2im = FOim * Wim;
       F1re = FOre * Wre;

We get rid of one unary negation, the second change should not affect the
performance.

So to sum it up, the performance should be almost identical with the original
code, where analysis could be a tiny bit slower, and synthesis could be a tiny
bit faster.

   Cu... Stefan
-- 
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]