Re: Making code auto vectorizable


On Tue, May 02, 2006 at 11:04:47PM +0200, Tim Janik wrote:
> On Tue, 2 May 2006, Stefan Westerfeld wrote:
> >On Tue, Apr 18, 2006 at 05:51:52PM +0200, Tim Janik wrote:
> >>>which is not (currently) recognized by the tree vectorizer. Rewriting
> >>>the loop without this construct, like this:
> >>>
> >>>int i;
> >>>for (i = 0; i < n_values; i++)
> >>>  output[i] = input[i];
> >>>
> >>>leads to a vectorizable loop. Note that this only works if i is signed,
> >>>so using a guint for iterating does not enable vectorization (it took me
> >>>quite some trial and error to figure that out).
> >>
> >>what compiler version is this?
> >>does the guint/gint problem persist in gcc-4.2snapshot?
> >
> >Yes, it does. And there is another change in gcc-snapshot: it doesn't
> >vectorize the loop any more, unless __restrict__ is used to declare that
> >the input and output buffer don't have a data dependency. I've updated
> >my patch accordingly.
> >>the rest looks good. provided it has been properly tested,
> >>this can go into CVS. do we have a feature test for BseAdder
> >>already?
> >
> >We do have a feature test and it still passes with the vectorized loop.
> >Should I commit the updated patch with the __restrict__ keyword added?
> >It might be necessary to look whether the compiler has support for it.
> no, first, we should define "restrict" to __restrict__ if it is supported
> and to nothing otherwise. and second, the bseadder code could be rewritten
> in terms of bse_block_copy_float() and bse_block_add_floats(), right?
> then, we should use that instead.

Since BseAdder supports subtracting, we would need to extend the Block
API. I can provide a new patch which does this, and reimplements the
adder on top of it.

However, the point about having an auto vectorizer in the first place is
that you don't have to rewrite all your code; you simply use a compiler
option and everything else happens automatically. We kind-of give this
up if we go on and on extending the block API for every problem that we
get, and eliminate inner loops more and more of modules, instead of
letting the auto vectorizer do the work.

Of course, its the question where to draw the line. Subtracting blocks
could be argued to be reasonably common, so that its not too bad to have
a generic version available.

   Cu... Stefan
Stefan Westerfeld, Hamburg/Germany,

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]