Making code auto vectorizable



   Hi!

I've built a new beast tree, which thanks to Tims work now supports
building SSE optimized versions of plugins. I used the compiler flag
-ftree-vectorizer-verbose=5 to see what actually gets vectorized. The
result is somewhat disappointing: not a single plugin benefits from the
auto vectorizer. The tree vectorizer messages are:

bseadder.c:209: note: not vectorized: number of iterations cannot be computed.
bseadder.c:216: note: not vectorized: number of iterations cannot be computed.
bseadder.c:229: note: not vectorized: number of iterations cannot be computed.
bseadder.c:238: note: not vectorized: number of iterations cannot be computed.
bseadder.c:238: note: vectorized 0 loops in function.

The problem is that the standard handling of loop boundaries (and
accessing audio data buffers) that many beast plugins use is this:

  gfloat *bound = output + n_values;
  while (output < bound)
    *output++ = *input++;

which is not (currently) recognized by the tree vectorizer. Rewriting
the loop without this construct, like this:

  int i;
  for (i = 0; i < n_values; i++)
    output[i] = input[i];

leads to a vectorizable loop. Note that this only works if i is signed,
so using a guint for iterating does not enable vectorization (it took me
quite some trial and error to figure that out).

To get most of the auto vectorizer, I suggest rewriting vectorizable
inner loops in the way I indicated, as I think it is (generally) not
slower for the non-SIMD case. In fact, which is faster (incrementing all
pointers or using one index variable) will probably depend on quite some
factors, like the processor type, pipelining, register allocation,
number of channels, the algorithm within the loop and so on.

Below is such a patch for BseAdder.

   Cu... Stefan

cvs server: Diffing .
Index: ChangeLog
===================================================================
RCS file: /cvs/gnome/beast/plugins/ChangeLog,v
retrieving revision 1.163
diff -u -p -r1.163 ChangeLog
--- ChangeLog	12 Apr 2006 01:05:32 -0000	1.163
+++ ChangeLog	12 Apr 2006 13:40:08 -0000
@@ -1,3 +1,8 @@
+Wed Apr 12 15:38:20 2006  Stefan Westerfeld  <stefan space twc de>
+
+	* bseadder.c: Rewrote inner loops in a way that can be auto vectorized
+	by the gcc-4.1 auto vectorizer.
+
 Wed Apr 12 02:35:47 2006  Tim Janik  <timj gtk org>
 
 	* Makefile.am: added a rule "refresh-Makefile.plugins:" to rebuild the
Index: bseadder.c
===================================================================
RCS file: /cvs/gnome/beast/plugins/bseadder.c,v
retrieving revision 1.30
diff -u -p -r1.30 bseadder.c
--- bseadder.c	23 Jul 2004 18:12:41 -0000	1.30
+++ bseadder.c	12 Apr 2006 13:40:08 -0000
@@ -190,10 +190,10 @@ adder_process (BseModule *module,
   Adder *adder = module->user_data;
   guint n_au1 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO1).n_connections;
   guint n_au2 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO2).n_connections;
-  gfloat *out, *audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
-  gfloat *bound = audio_out + n_values;
+  gfloat *audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
   const gfloat *auin;
   guint i;
+  int n;
 
   if (!n_au1 && !n_au2)
     {
@@ -203,17 +203,13 @@ adder_process (BseModule *module,
   if (n_au1)	/* sum up audio1 inputs */
     {
       auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, 0);
-      out = audio_out;
-      do
-	*out++ = *auin++;
-      while (out < bound);
+      for (n = 0; n < n_values; n++)
+	audio_out[n] = auin[n];
       for (i = 1; i < n_au1; i++)
 	{
 	  auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, i);
-	  out = audio_out;
-	  do
-	    *out++ += *auin++;
-	  while (out < bound);
+	  for (n = 0; n < n_values; n++)
+	    audio_out[n] += auin[n];
 	}
     }
   else
@@ -223,19 +219,15 @@ adder_process (BseModule *module,
     for (i = 0; i < n_au2; i++)
       {
 	auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
-	out = audio_out;
-	do
-	  *out++ += *auin++;
-	while (out < bound);
+	for (n = 0; n < n_values; n++)
+	  audio_out[n] += auin[n];
       }
   else if (n_au2)		/*  subtract audio2 inputs */
     for (i = 0; i < n_au2; i++)
       {
 	auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
-	out = audio_out;
-	do
-	  *out++ -= *auin++;
-	while (out < bound);
+	for (n = 0; n < n_values; n++)
+	  audio_out[n] -= auin[n];
       }
 }
 
cvs server: Diffing evaluator
cvs server: Diffing freeverb
cvs server: Diffing icons



-- 
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]