Re: Making code auto vectorizable



   Hi!

On Tue, Apr 18, 2006 at 05:51:52PM +0200, Tim Janik wrote:
> On Wed, 12 Apr 2006, Stefan Westerfeld wrote:
> >I've built a new beast tree, which thanks to Tims work now supports
> >building SSE optimized versions of plugins. I used the compiler flag
> >-ftree-vectorizer-verbose=5 to see what actually gets vectorized. The
> >result is somewhat disappointing: not a single plugin benefits from the
> >auto vectorizer. The tree vectorizer messages are:
> >
> >bseadder.c:209: note: not vectorized: number of iterations cannot be 
> >computed.
> >bseadder.c:216: note: not vectorized: number of iterations cannot be 
> >computed.
> >bseadder.c:229: note: not vectorized: number of iterations cannot be 
> >computed.
> >bseadder.c:238: note: not vectorized: number of iterations cannot be 
> >computed.
> >bseadder.c:238: note: vectorized 0 loops in function.
> >
> >The problem is that the standard handling of loop boundaries (and
> >accessing audio data buffers) that many beast plugins use is this:
> >
> > gfloat *bound = output + n_values;
> > while (output < bound)
> >   *output++ = *input++;
> >
> >which is not (currently) recognized by the tree vectorizer. Rewriting
> >the loop without this construct, like this:
> >
> > int i;
> > for (i = 0; i < n_values; i++)
> >   output[i] = input[i];
> >
> >leads to a vectorizable loop. Note that this only works if i is signed,
> >so using a guint for iterating does not enable vectorization (it took me
> >quite some trial and error to figure that out).
> 
> what compiler version is this?
> does the guint/gint problem persist in gcc-4.2snapshot?

Yes, it does. And there is another change in gcc-snapshot: it doesn't
vectorize the loop any more, unless __restrict__ is used to declare that
the input and output buffer don't have a data dependency. I've updated
my patch accordingly.

> >BSE_ADDER_OCHANNEL_AUDIO_OUT);
> >  const gfloat *auin;
> >  guint i;
> >+  int n;
> 
> for pure iteration i,j,k,u,v,x,y,z are more often used as iteration
> variables than those often used to denote certain sizes, lengths or
> dimensions like l,m,n,s.
> i.e. please use 'j' instead of 'n' here.

Ok.

> >-      do
> >-	*out++ = *auin++;
> >-      while (out < bound);
> >+      for (n = 0; n < n_values; n++)
> >+	audio_out[n] = auin[n];
> 
> and while you're at it, please declare const gfloat *auin=... in the 
> innermost
> scope possible.

Ok.

> the rest looks good. provided it has been properly tested,
> this can go into CVS. do we have a feature test for BseAdder
> already?

We do have a feature test and it still passes with the vectorized loop.
Should I commit the updated patch with the __restrict__ keyword added?
It might be necessary to look whether the compiler has support for it.

The new patch looks like this:

Index: ChangeLog
===================================================================
RCS file: /cvs/gnome/beast/plugins/ChangeLog,v
retrieving revision 1.168
diff -u -p -u -r1.168 ChangeLog
--- ChangeLog	20 Apr 2006 18:09:27 -0000	1.168
+++ ChangeLog	2 May 2006 20:05:43 -0000
@@ -1,3 +1,8 @@
+Tue May  2 22:04:52 2006  Stefan Westerfeld  <stefan space twc de>
+
+	* bseadder.c: Rewrote inner loops in a way that can be auto vectorized
+	by the gcc-4.1 and gcc-snapshot auto vectorizer.
+
 Thu Apr 20 20:08:52 2006  Tim Janik  <timj gtk org>
 
 	* bseblockutils.cc: fixed inner variable declarations which erroneously
Index: bseadder.c
===================================================================
RCS file: /cvs/gnome/beast/plugins/bseadder.c,v
retrieving revision 1.30
diff -u -p -u -r1.30 bseadder.c
--- bseadder.c	23 Jul 2004 18:12:41 -0000	1.30
+++ bseadder.c	2 May 2006 20:05:43 -0000
@@ -190,10 +190,9 @@ adder_process (BseModule *module,
   Adder *adder = module->user_data;
   guint n_au1 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO1).n_connections;
   guint n_au2 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO2).n_connections;
-  gfloat *out, *audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
-  gfloat *bound = audio_out + n_values;
-  const gfloat *auin;
+  gfloat *__restrict__ audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
   guint i;
+  gint j;
 
   if (!n_au1 && !n_au2)
     {
@@ -202,18 +201,14 @@ adder_process (BseModule *module,
     }
   if (n_au1)	/* sum up audio1 inputs */
     {
-      auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, 0);
-      out = audio_out;
-      do
-	*out++ = *auin++;
-      while (out < bound);
+      const gfloat *__restrict__ auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, 0);
+      for (j = 0; j < n_values; j++)
+	audio_out[j] = auin[j];
       for (i = 1; i < n_au1; i++)
 	{
 	  auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, i);
-	  out = audio_out;
-	  do
-	    *out++ += *auin++;
-	  while (out < bound);
+	  for (j = 0; j < n_values; j++)
+	    audio_out[j] += auin[j];
 	}
     }
   else
@@ -222,20 +217,16 @@ adder_process (BseModule *module,
   if (n_au2 && !adder->subtract)	/* sum up audio2 inputs */
     for (i = 0; i < n_au2; i++)
       {
-	auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
-	out = audio_out;
-	do
-	  *out++ += *auin++;
-	while (out < bound);
+	const gfloat *__restrict__ auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
+	for (j = 0; j < n_values; j++)
+	  audio_out[j] += auin[j];
       }
   else if (n_au2)		/*  subtract audio2 inputs */
     for (i = 0; i < n_au2; i++)
       {
-	auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
-	out = audio_out;
-	do
-	  *out++ -= *auin++;
-	while (out < bound);
+	const gfloat *__restrict__ auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
+	for (j = 0; j < n_values; j++)
+	  audio_out[j] -= auin[j];
       }
 }
 

   Cu... Stefan
-- 
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]