We are doing some fuzzy testing using C program generators,
and one question that came up when generating a program with
both floating point arithmetic and loop pragmas was;
Is the loop vectorizer really allowed to vectorize a loop when
it can't prove that it is safe to reorder fp math, even if
there is a loop pragma that hints about a preferred width.
When reading here
http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations
it says " Loop hints can be specified before any loop and
will be ignored if the optimization is not safe to apply.".
But given this example (see also https://godbolt.org/z/fzRHsp )
//------------------------------------------------------------------
//
// clang -O3 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize
#include <stdio.h>
#include <stdint.h>
double v_1 = -902.30847021;
double v_2 = -902.30847021;
int main()
{
#pragma clang loop vectorize_width(2) unroll(disable)
for (int i = 0; i < 16; ++i) {
v_1 = v_1 * 430.33975544;
}
#pragma clang loop unroll(disable)
for (int i = 0; i < 16; ++i) {
v_2 = v_2 * 430.33975544;
}
printf("v_1: %f\n", v_1);
printf("v_2: %f\n", v_2);
}
//
//------------------------------------------------------------------
we get these remarks:
<source>:11:3: remark: the cost-model indicates that interleaving is not beneficial [-Rpass-analysis=loop-vectorize]
<source>:11:3: remark: vectorized loop (vectorization width: 2, interleaved count: 1) [-Rpass=loop-vectorize]
<source>:17:15: remark: loop not vectorized: cannot prove it is safe to reorder floating-point operations; allow reordering by specifying '#pragma clang loop vectorize(enable)'
and the result:
v_1: -1248356232174473978185211891975727638059679744.000000
v_2: -1248356232174473819728886863447052450971779072.000000
So the second loop isn't vectorized due to unsafe reordering of fp math.
But the first loop is vectorized, even if the optimization isn't safe to apply.
And this is also reflected in that we get different result for v_1 and v_2.
Is this correct behavior? Should the pragma result in vectorization here?
Note that we get vectorization even with "vectorize_width(3)". So despite
the fact that LV ignores the bad vectorization factor, it consider vectorization
to be "forced".
(I also wonder if "forced" is bad terminology here, if the pragma should be considered as a hint.)
Regards,
Björn Pettersson
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
This is a good question. The statement above was written with memory
dependence checks in mind. In this case, the lack of safety comes from
the floating-point reassociation. Part of the problem here is the
translation of the behavior of the compiler to the language in the
documentation. When we say that the pragma "will be ignored", we don't
literally mean that the compiler necessarily ignores it *statically*, we
mean that the effect of the vectorization might be ignored *dynamically*
in cases where vectorization might be unsafe. We do this, as you likely
know, by multiversioning the loop, and using a memory-dependence check
to select, during program execution, which to run.
Regarding the effect of reassociation, I don't know of any efficient way
that we might check ahead of time whether the reassociation would
produce a different runtime result from the scalar loop. We're relying
on the user's directive to tell the compiler that the reassociation is
safe. An alternative design would require in the pragma some explicit
acknowledgement of the reduction (e.g., what happens, at least in the
specification, for OpenMP SIMD). We would want a different notation from
the existing vectorize(assume_safety) used to disable the dependence
checks. I'm highly sympathetic to your use case, in part because I do
the same thing, and in part because I also work on autotuning systems
that need the same property. However, in this case, our systems need to
keep track of the presence of reductions. I think it's reasonable to say
that the pragma is working as designed and we should update the
documentation. If there's consensus here to require some kind of
reduction acknowledgement, I'm fine with that too (although we need to
realize that's going to cause significant regressions for existing users).
-Hal
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
I think that it's important that we're precise here in our discussion.
Adding the pragma does not enable -ffast-math for the loop. Instead, it
permits only a very specific reassociation (specifically, only those
needed to allow the vectorized calculation of the reduction result).
Nothing else is changed. Moreover, supporting this precise allowance is
an important use case. We need to keep it somehow.
>
> I did not explicitly mention -O0 in my earlier examples, but doesn't
> it feel weird that when compiling a program with vectorization hints,
> with -fno-fast-math, I might get different results when executing the
> program depending on if I used -O0 or -O3 when compiling.
> That is actually what our test-framework were doing (comparing result
> when using "-O0 -fno-fast-math " and "-O3 -fno-fast-math"). and it
> ended up with failures due to loop pragmas being present in the code.
As I said before, I definitely understand your point of view. If we were
designing the pragma today, I would support your position. I'm just not
sure it's worth changing now. We should just document that the pragma is
a hint *except* that it has this particular semantic effect. Ugly, but
matches our long-standing practice.
That all having been said, we have been working to design a better set
of pragmas to control loop transformations (see
https://reviews.llvm.org/D69088 and the associated talks / RFC). In this
context, it might make more sense to address this concern. Michael Kruse
has been doing the work on this. Michael, what do you think about this
in the context of pragma loop transform?
>
>
> I also noticed that there are some TTI-hooks that seem to be a bit
> related to this. But since both LoopVectorizeHints::allowReordering()
> and LoopVectorizeHints::isPotentiallyUnsafe() are out-ruled by the
> FK_Enabled hint it doesn't matter what the TTI hooks are saying.
Right, because in this context, the pragma is not just a hint. It has
semantic properties.
-Hal