Making code auto vectorizable
- From: Stefan Westerfeld <stefan space twc de>
- To: beast gnome org
- Subject: Making code auto vectorizable
- Date: Wed, 12 Apr 2006 17:25:33 +0200
Hi!
I've built a new beast tree, which thanks to Tims work now supports
building SSE optimized versions of plugins. I used the compiler flag
-ftree-vectorizer-verbose=5 to see what actually gets vectorized. The
result is somewhat disappointing: not a single plugin benefits from the
auto vectorizer. The tree vectorizer messages are:
bseadder.c:209: note: not vectorized: number of iterations cannot be computed.
bseadder.c:216: note: not vectorized: number of iterations cannot be computed.
bseadder.c:229: note: not vectorized: number of iterations cannot be computed.
bseadder.c:238: note: not vectorized: number of iterations cannot be computed.
bseadder.c:238: note: vectorized 0 loops in function.
The problem is that the standard handling of loop boundaries (and
accessing audio data buffers) that many beast plugins use is this:
gfloat *bound = output + n_values;
while (output < bound)
*output++ = *input++;
which is not (currently) recognized by the tree vectorizer. Rewriting
the loop without this construct, like this:
int i;
for (i = 0; i < n_values; i++)
output[i] = input[i];
leads to a vectorizable loop. Note that this only works if i is signed,
so using a guint for iterating does not enable vectorization (it took me
quite some trial and error to figure that out).
To get most of the auto vectorizer, I suggest rewriting vectorizable
inner loops in the way I indicated, as I think it is (generally) not
slower for the non-SIMD case. In fact, which is faster (incrementing all
pointers or using one index variable) will probably depend on quite some
factors, like the processor type, pipelining, register allocation,
number of channels, the algorithm within the loop and so on.
Below is such a patch for BseAdder.
Cu... Stefan
cvs server: Diffing .
Index: ChangeLog
===================================================================
RCS file: /cvs/gnome/beast/plugins/ChangeLog,v
retrieving revision 1.163
diff -u -p -r1.163 ChangeLog
--- ChangeLog 12 Apr 2006 01:05:32 -0000 1.163
+++ ChangeLog 12 Apr 2006 13:40:08 -0000
@@ -1,3 +1,8 @@
+Wed Apr 12 15:38:20 2006 Stefan Westerfeld <stefan space twc de>
+
+ * bseadder.c: Rewrote inner loops in a way that can be auto vectorized
+ by the gcc-4.1 auto vectorizer.
+
Wed Apr 12 02:35:47 2006 Tim Janik <timj gtk org>
* Makefile.am: added a rule "refresh-Makefile.plugins:" to rebuild the
Index: bseadder.c
===================================================================
RCS file: /cvs/gnome/beast/plugins/bseadder.c,v
retrieving revision 1.30
diff -u -p -r1.30 bseadder.c
--- bseadder.c 23 Jul 2004 18:12:41 -0000 1.30
+++ bseadder.c 12 Apr 2006 13:40:08 -0000
@@ -190,10 +190,10 @@ adder_process (BseModule *module,
Adder *adder = module->user_data;
guint n_au1 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO1).n_connections;
guint n_au2 = BSE_MODULE_JSTREAM (module, BSE_ADDER_JCHANNEL_AUDIO2).n_connections;
- gfloat *out, *audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
- gfloat *bound = audio_out + n_values;
+ gfloat *audio_out = BSE_MODULE_OBUFFER (module, BSE_ADDER_OCHANNEL_AUDIO_OUT);
const gfloat *auin;
guint i;
+ int n;
if (!n_au1 && !n_au2)
{
@@ -203,17 +203,13 @@ adder_process (BseModule *module,
if (n_au1) /* sum up audio1 inputs */
{
auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, 0);
- out = audio_out;
- do
- *out++ = *auin++;
- while (out < bound);
+ for (n = 0; n < n_values; n++)
+ audio_out[n] = auin[n];
for (i = 1; i < n_au1; i++)
{
auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO1, i);
- out = audio_out;
- do
- *out++ += *auin++;
- while (out < bound);
+ for (n = 0; n < n_values; n++)
+ audio_out[n] += auin[n];
}
}
else
@@ -223,19 +219,15 @@ adder_process (BseModule *module,
for (i = 0; i < n_au2; i++)
{
auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
- out = audio_out;
- do
- *out++ += *auin++;
- while (out < bound);
+ for (n = 0; n < n_values; n++)
+ audio_out[n] += auin[n];
}
else if (n_au2) /* subtract audio2 inputs */
for (i = 0; i < n_au2; i++)
{
auin = BSE_MODULE_JBUFFER (module, BSE_ADDER_JCHANNEL_AUDIO2, i);
- out = audio_out;
- do
- *out++ -= *auin++;
- while (out < bound);
+ for (n = 0; n < n_values; n++)
+ audio_out[n] -= auin[n];
}
}
cvs server: Diffing evaluator
cvs server: Diffing freeverb
cvs server: Diffing icons
--
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]