Re: Making code auto vectorizable
- From: Tim Janik <timj gtk org>
- To: Stefan Westerfeld <stefan space twc de>
- Cc: beast gnome org
- Subject: Re: Making code auto vectorizable
- Date: Wed, 3 May 2006 12:22:53 +0200 (CEST)
On Wed, 3 May 2006, Stefan Westerfeld wrote:
Hi!
On Tue, May 02, 2006 at 11:04:47PM +0200, Tim Janik wrote:
On Tue, 2 May 2006, Stefan Westerfeld wrote:
On Tue, Apr 18, 2006 at 05:51:52PM +0200, Tim Janik wrote:
the rest looks good. provided it has been properly tested,
this can go into CVS. do we have a feature test for BseAdder
already?
We do have a feature test and it still passes with the vectorized loop.
Should I commit the updated patch with the __restrict__ keyword added?
It might be necessary to look whether the compiler has support for it.
no, first, we should define "restrict" to __restrict__ if it is supported
and to nothing otherwise. and second, the bseadder code could be rewritten
in terms of bse_block_copy_float() and bse_block_add_floats(), right?
then, we should use that instead.
Since BseAdder supports subtracting, we would need to extend the Block
API. I can provide a new patch which does this, and reimplements the
adder on top of it.
sure, that'd be great.
However, the point about having an auto vectorizer in the first place is
that you don't have to rewrite all your code; you simply use a compiler
option and everything else happens automatically. We kind-of give this
up if we go on and on extending the block API for every problem that we
get, and eliminate inner loops more and more of modules, instead of
letting the auto vectorizer do the work.
Of course, its the question where to draw the line. Subtracting blocks
could be argued to be reasonably common, so that its not too bad to have
a generic version available.
exactly, you're right that we'll have to draw an arbitrary line somewhere.
not relying on the auto-vectorizer but using hand crufted vectorized
functions does have certain advantages though:
- the optimization is less compiler (version) dependent;
- the code is possibly faster, because the programmer can adapt loops and
associated data structures for vectorized operations, that's more than
the compiler can do;
- in some cases, hand crufted optimizations may be doable that are
not available to the auto-vectorizer, such as using small asm-loops or
other pointer/block-address pokage that rely on intrinsic system knowledge.
as you discovered in your auto-vectorization tests, changes to the existing
code are required anyway, so i suggest we do the following:
- factor out simple inner loops with high optimization potential, such as
the adder subtract loop, when this is simple enough to do;
- add "restrict" and "int" loop variables in other cases where this helps
the auto-vectorizer.
- factor out any block operation that can be optimized but resides in the
BSE core. that's because the core can't be compield with SSE or similar
optimizations like the plugins can.
Cu... Stefan
---
ciaoTJ
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]