Re: Making code auto vectorizable



On Wed, 12 Apr 2006, Stefan Westerfeld wrote:

  Hi!

I've built a new beast tree, which thanks to Tims work now supports
building SSE optimized versions of plugins. I used the compiler flag
-ftree-vectorizer-verbose=5 to see what actually gets vectorized. The
result is somewhat disappointing: not a single plugin benefits from the
auto vectorizer. The tree vectorizer messages are:

bseadder.c:209: note: not vectorized: number of iterations cannot be computed.
bseadder.c:216: note: not vectorized: number of iterations cannot be computed.
bseadder.c:229: note: not vectorized: number of iterations cannot be computed.
bseadder.c:238: note: not vectorized: number of iterations cannot be computed.
bseadder.c:238: note: vectorized 0 loops in function.

The problem is that the standard handling of loop boundaries (and
accessing audio data buffers) that many beast plugins use is this:

 gfloat *bound = output + n_values;
 while (output < bound)
   *output++ = *input++;

which is not (currently) recognized by the tree vectorizer. Rewriting
the loop without this construct, like this:

 int i;
 for (i = 0; i < n_values; i++)
   output[i] = input[i];

leads to a vectorizable loop. Note that this only works if i is signed,
so using a guint for iterating does not enable vectorization (it took me
quite some trial and error to figure that out).

what compiler version is this?
does the guint/gint problem persist in gcc-4.2snapshot?

To get most of the auto vectorizer, I suggest rewriting vectorizable
inner loops in the way I indicated, as I think it is (generally) not
slower for the non-SIMD case. In fact, which is faster (incrementing all
pointers or using one index variable) will probably depend on quite some
factors, like the processor type, pipelining, register allocation,
number of channels, the algorithm within the loop and so on.

Below is such a patch for BseAdder.

ok thanks. please feel free to cook up more patches ;)

  Cu... Stefan

cvs server: Diffing .
Index: ChangeLog
===================================================================
RCS file: /cvs/gnome/beast/plugins/ChangeLog,v
retrieving revision 1.163
diff -u -p -r1.163 ChangeLog
--- ChangeLog	12 Apr 2006 01:05:32 -0000	1.163
+++ ChangeLog	12 Apr 2006 13:40:08 -0000
@@ -1,3 +1,8 @@
+Wed Apr 12 15:38:20 2006  Stefan Westerfeld  <stefan space twc de>
+
+	* bseadder.c: Rewrote inner loops in a way that can be auto vectorized
+	by the gcc-4.1 auto vectorizer.
+
Wed Apr 12 02:35:47 2006  Tim Janik  <timj gtk org>

	* Makefile.am: added a rule "refresh-Makefile.plugins:" to rebuild the
Index: bseadder.c
===================================================================
RCS file: /cvs/gnome/beast/plugins/bseadder.c,v
retrieving revision 1.30
diff -u -p -r1.30 bseadder.c
--- bseadder.c	23 Jul 2004 18:12:41 -0000	1.30
+++ bseadder.c	12 Apr 2006 13:40:08 -0000
@@ -190,10 +190,10 @@ adder_process (BseModule *module,
  Adder *adder = module->user_data;
  guint n_au1 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO1).n_connections;
  guint n_au2 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO2).n_connections;
-  gfloat *out, *audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
-  gfloat *bound = audio_out + n_values;
+  gfloat *audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
  const gfloat *auin;
  guint i;
+  int n;

for pure iteration i,j,k,u,v,x,y,z are more often used as iteration
variables than those often used to denote certain sizes, lengths or
dimensions like l,m,n,s.
i.e. please use 'j' instead of 'n' here.



  if (!n_au1 && !n_au2)
    {
@@ -203,17 +203,13 @@ adder_process (BseModule *module,
  if (n_au1)	/* sum up audio1 inputs */
    {
      auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, 0);
-      out = audio_out;
-      do
-	*out++ = *auin++;
-      while (out < bound);
+      for (n = 0; n < n_values; n++)
+	audio_out[n] = auin[n];

and while you're at it, please declare const gfloat *auin=... in the innermost
scope possible.


      for (i = 1; i < n_au1; i++)
	{
	  auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, i);
-	  out = audio_out;
-	  do
-	    *out++ += *auin++;
-	  while (out < bound);
+	  for (n = 0; n < n_values; n++)
+	    audio_out[n] += auin[n];
	}
    }
  else
@@ -223,19 +219,15 @@ adder_process (BseModule *module,
    for (i = 0; i < n_au2; i++)
      {
	auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
-	out = audio_out;
-	do
-	  *out++ += *auin++;
-	while (out < bound);
+	for (n = 0; n < n_values; n++)
+	  audio_out[n] += auin[n];
      }
  else if (n_au2)		/*  subtract audio2 inputs */
    for (i = 0; i < n_au2; i++)
      {
	auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
-	out = audio_out;
-	do
-	  *out++ -= *auin++;
-	while (out < bound);
+	for (n = 0; n < n_values; n++)
+	  audio_out[n] -= auin[n];
      }
}


the rest looks good. provided it has been properly tested,
this can go into CVS. do we have a feature test for BseAdder
already?

---
ciaoTJ



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]