Re: Making code auto vectorizable
- From: Stefan Westerfeld <stefan space twc de>
- To: Tim Janik <timj gtk org>
- Cc: beast gnome org
- Subject: Re: Making code auto vectorizable
- Date: Tue, 2 May 2006 22:43:05 +0200
Hi!
On Tue, Apr 18, 2006 at 05:51:52PM +0200, Tim Janik wrote:
> On Wed, 12 Apr 2006, Stefan Westerfeld wrote:
> >I've built a new beast tree, which thanks to Tims work now supports
> >building SSE optimized versions of plugins. I used the compiler flag
> >-ftree-vectorizer-verbose=5 to see what actually gets vectorized. The
> >result is somewhat disappointing: not a single plugin benefits from the
> >auto vectorizer. The tree vectorizer messages are:
> >
> >bseadder.c:209: note: not vectorized: number of iterations cannot be
> >computed.
> >bseadder.c:216: note: not vectorized: number of iterations cannot be
> >computed.
> >bseadder.c:229: note: not vectorized: number of iterations cannot be
> >computed.
> >bseadder.c:238: note: not vectorized: number of iterations cannot be
> >computed.
> >bseadder.c:238: note: vectorized 0 loops in function.
> >
> >The problem is that the standard handling of loop boundaries (and
> >accessing audio data buffers) that many beast plugins use is this:
> >
> > gfloat *bound = output + n_values;
> > while (output < bound)
> > *output++ = *input++;
> >
> >which is not (currently) recognized by the tree vectorizer. Rewriting
> >the loop without this construct, like this:
> >
> > int i;
> > for (i = 0; i < n_values; i++)
> > output[i] = input[i];
> >
> >leads to a vectorizable loop. Note that this only works if i is signed,
> >so using a guint for iterating does not enable vectorization (it took me
> >quite some trial and error to figure that out).
>
> what compiler version is this?
> does the guint/gint problem persist in gcc-4.2snapshot?
Yes, it does. And there is another change in gcc-snapshot: it doesn't
vectorize the loop any more, unless __restrict__ is used to declare that
the input and output buffer don't have a data dependency. I've updated
my patch accordingly.
> >BSE_ADDER_OCHANNEL_AUDIO_OUT);
> > const gfloat *auin;
> > guint i;
> >+ int n;
>
> for pure iteration i,j,k,u,v,x,y,z are more often used as iteration
> variables than those often used to denote certain sizes, lengths or
> dimensions like l,m,n,s.
> i.e. please use 'j' instead of 'n' here.
Ok.
> >- do
> >- *out++ = *auin++;
> >- while (out < bound);
> >+ for (n = 0; n < n_values; n++)
> >+ audio_out[n] = auin[n];
>
> and while you're at it, please declare const gfloat *auin=... in the
> innermost
> scope possible.
Ok.
> the rest looks good. provided it has been properly tested,
> this can go into CVS. do we have a feature test for BseAdder
> already?
We do have a feature test and it still passes with the vectorized loop.
Should I commit the updated patch with the __restrict__ keyword added?
It might be necessary to look whether the compiler has support for it.
The new patch looks like this:
Index: ChangeLog
===================================================================
RCS file: /cvs/gnome/beast/plugins/ChangeLog,v
retrieving revision 1.168
diff -u -p -u -r1.168 ChangeLog
--- ChangeLog 20 Apr 2006 18:09:27 -0000 1.168
+++ ChangeLog 2 May 2006 20:05:43 -0000
@@ -1,3 +1,8 @@
+Tue May 2 22:04:52 2006 Stefan Westerfeld <stefan space twc de>
+
+ * bseadder.c: Rewrote inner loops in a way that can be auto vectorized
+ by the gcc-4.1 and gcc-snapshot auto vectorizer.
+
Thu Apr 20 20:08:52 2006 Tim Janik <timj gtk org>
* bseblockutils.cc: fixed inner variable declarations which erroneously
Index: bseadder.c
===================================================================
RCS file: /cvs/gnome/beast/plugins/bseadder.c,v
retrieving revision 1.30
diff -u -p -u -r1.30 bseadder.c
--- bseadder.c 23 Jul 2004 18:12:41 -0000 1.30
+++ bseadder.c 2 May 2006 20:05:43 -0000
@@ -190,10 +190,9 @@ adder_process (BseModule *module,
Adder *adder = module->user_data;
guint n_au1 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO1).n_connections;
guint n_au2 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO2).n_connections;
- gfloat *out, *audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
- gfloat *bound = audio_out + n_values;
- const gfloat *auin;
+ gfloat *__restrict__ audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
guint i;
+ gint j;
if (!n_au1 && !n_au2)
{
@@ -202,18 +201,14 @@ adder_process (BseModule *module,
}
if (n_au1) /* sum up audio1 inputs */
{
- auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, 0);
- out = audio_out;
- do
- *out++ = *auin++;
- while (out < bound);
+ const gfloat *__restrict__ auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, 0);
+ for (j = 0; j < n_values; j++)
+ audio_out[j] = auin[j];
for (i = 1; i < n_au1; i++)
{
auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, i);
- out = audio_out;
- do
- *out++ += *auin++;
- while (out < bound);
+ for (j = 0; j < n_values; j++)
+ audio_out[j] += auin[j];
}
}
else
@@ -222,20 +217,16 @@ adder_process (BseModule *module,
if (n_au2 && !adder->subtract) /* sum up audio2 inputs */
for (i = 0; i < n_au2; i++)
{
- auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
- out = audio_out;
- do
- *out++ += *auin++;
- while (out < bound);
+ const gfloat *__restrict__ auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
+ for (j = 0; j < n_values; j++)
+ audio_out[j] += auin[j];
}
else if (n_au2) /* subtract audio2 inputs */
for (i = 0; i < n_au2; i++)
{
- auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
- out = audio_out;
- do
- *out++ -= *auin++;
- while (out < bound);
+ const gfloat *__restrict__ auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
+ for (j = 0; j < n_values; j++)
+ audio_out[j] -= auin[j];
}
}
Cu... Stefan
--
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]