Re: Making code auto vectorizable



On Wed, 3 May 2006, Stefan Westerfeld wrote:

  Hi!

On Tue, May 02, 2006 at 11:04:47PM +0200, Tim Janik wrote:
On Tue, 2 May 2006, Stefan Westerfeld wrote:
On Tue, Apr 18, 2006 at 05:51:52PM +0200, Tim Janik wrote:

the rest looks good. provided it has been properly tested,
this can go into CVS. do we have a feature test for BseAdder
already?

We do have a feature test and it still passes with the vectorized loop.
Should I commit the updated patch with the __restrict__ keyword added?
It might be necessary to look whether the compiler has support for it.

no, first, we should define "restrict" to __restrict__ if it is supported
and to nothing otherwise. and second, the bseadder code could be rewritten
in terms of bse_block_copy_float() and bse_block_add_floats(), right?
then, we should use that instead.

Since BseAdder supports subtracting, we would need to extend the Block
API. I can provide a new patch which does this, and reimplements the
adder on top of it.

sure, that'd be great.

However, the point about having an auto vectorizer in the first place is
that you don't have to rewrite all your code; you simply use a compiler
option and everything else happens automatically. We kind-of give this
up if we go on and on extending the block API for every problem that we
get, and eliminate inner loops more and more of modules, instead of
letting the auto vectorizer do the work.

Of course, its the question where to draw the line. Subtracting blocks
could be argued to be reasonably common, so that its not too bad to have
a generic version available.

exactly, you're right that we'll have to draw an arbitrary line somewhere.

not relying on the auto-vectorizer but using hand crufted vectorized
functions does have certain advantages though:
- the optimization is less compiler (version) dependent;
- the code is possibly faster, because the programmer can adapt loops and
  associated data structures for vectorized operations, that's more than
  the compiler can do;
- in some cases, hand crufted optimizations may be doable that are
  not available to the auto-vectorizer, such as using small asm-loops or
  other pointer/block-address pokage that rely on intrinsic system knowledge.

as you discovered in your auto-vectorization tests, changes to the existing
code are required anyway, so i suggest we do the following:
- factor out simple inner loops with high optimization potential, such as
  the adder subtract loop, when this is simple enough to do;
- add "restrict" and "int" loop variables in other cases where this helps
  the auto-vectorizer.
- factor out any block operation that can be optimized but resides in the
  BSE core. that's because the core can't be compield with SSE or similar
  optimizations like the plugins can.

  Cu... Stefan

---
ciaoTJ



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]