On 4 April 2014 18:20, Mircea Namolaru <
mircea....@gmail.com> wrote:
> I think that the problem is already in the tree SSA code:
>
> For CLAST generated by CLooG:
> for
> (scat_1=0;scat_1<=-4294967296*floord(maxLen_5-minLen_4,4294967296)+maxLen_5-minLen_4;scat_1++)
> {
> (scat_1);
> }
>
> In tree SSA the computation of the bounds of the loop is done on long signed
> integers and looks as"
> for
> (scat_1=0;scat_1<=4294967296*floord(maxLen_5-minLen_4,-4294967296)+maxLen_5-minLen_4;scat_1++)
> {
> (scat_1);
> }
> The expression
> -4294967296*floord(maxLen_5-minLen_4,4294967296)
> is translated to
> 4294967296*floord(maxLen_5-minLen_4,-4294967296)
> In the second case the result of the floord is -1 instead of 0 and the loop is not entered..
Nice find, I thought /[fl] worked in a different way.
On 5 April 2014 00:06, Tobias Grosser <
tob...@grosser.es> wrote:
> Great finding! Any idea why this is the case?
I looked more into the problem and found something very curious during
clast-to-gimple conversion in Graphite. The explanation turns out to
be kind of lenghthy, please bear with me.
Consider the stacktrace taken during clast-to-gimple conversion:
...
#4 0x0000000000e41b6f in make_node_stat (code=FLOOR_DIV_EXPR) at
../../gcc/tree.c:847
#5 0x0000000000e4e1c0 in build2_stat (code=FLOOR_DIV_EXPR,
tt=0x7ffff6ebce70, arg0=0x7ffff6eca168,
arg1=0x7ffff6ec98a0) at ../../gcc/tree.c:4162
#6 0x000000000087d1ec in build2_stat_loc (loc=0, code=FLOOR_DIV_EXPR,
type=0x7ffff6ebce70, arg0=0x7ffff6eca168,
arg1=0x7ffff6ec98a0) at ../../gcc/tree.h:3476
#7 0x00000000008e3df9 in fold_build2_stat_loc (loc=0,
code=FLOOR_DIV_EXPR, type=0x7ffff6ebce70,
op0=0x7ffff6eca168, op1=0x7ffff6ec98a0) at ../../gcc/fold-const.c:15196
#8 0x0000000000882c56 in fold_negate_expr (loc=0, t=0x7ffff6eca190)
at ../../gcc/fold-const.c:704
#9 0x0000000000883143 in negate_expr (t=0x7ffff6eca190) at
../../gcc/fold-const.c:790
#10 0x00000000008bcacf in fold_binary_loc (loc=0, code=MULT_EXPR,
type=0x7ffff6ebce70, op0=0x7ffff6eca190,
op1=0x7ffff6ec98a0) at ../../gcc/fold-const.c:11116
#11 0x00000000008e3dd2 in fold_build2_stat_loc (loc=0, code=MULT_EXPR,
type=0x7ffff6ebce70, op0=0x7ffff6eca190,
op1=0x7ffff6ec98a0) at ../../gcc/fold-const.c:15194
#12 0x00000000008b3ca5 in fold_binary_loc (loc=0, code=MULT_EXPR,
type=0x7ffff6ebce70, op0=0x7ffff6ec98a0,
op1=0x7ffff6eca190) at ../../gcc/fold-const.c:10181
#13 0x00000000008e3dd2 in fold_build2_stat_loc (loc=0, code=MULT_EXPR,
type=0x7ffff6ebce70, op0=0x7ffff6ec98a0,
op1=0x7ffff6eca190) at ../../gcc/fold-const.c:15194
#14 0x00000000012955c3 in clast_to_gcc_expression
(type=0x7ffff6ebce70, e=0x1f71870, ip=0x7fffffffd890)
at ../../gcc/graphite-clast-to-gimple.c:458
#15 0x0000000001295166 in clast_to_gcc_expression_red
(type=0x7ffff6ebce70, op=PLUS_EXPR, r=0x1f74ef0,
ip=0x7fffffffd890) at ../../gcc/graphite-clast-to-gimple.c:395
#16 0x000000000129564a in clast_to_gcc_expression
(type=0x7ffff6ebce70, e=0x1f74ef0, ip=0x7fffffffd890)
at ../../gcc/graphite-clast-to-gimple.c:477
#17 0x0000000001295166 in clast_to_gcc_expression_red
(type=0x7ffff6ebce70, op=MIN_EXPR, r=0x1f752d0,
ip=0x7fffffffd890) at ../../gcc/graphite-clast-to-gimple.c:395
#18 0x0000000001295668 in clast_to_gcc_expression
(type=0x7ffff6ebce70, e=0x1f752d0, ip=0x7fffffffd890)
at ../../gcc/graphite-clast-to-gimple.c:480
#19 0x0000000001296c01 in graphite_create_new_loop_guard
(entry_edge=0x7ffff6e57d90, stmt=0x1f94a00,
type=0x7fffffffd768, lb=0x7fffffffd770, ub=0x7fffffffd778,
ip=0x7fffffffd890)
at ../../gcc/graphite-clast-to-gimple.c:1140
#20 0x0000000001296ef5 in translate_clast_for
(context_loop=0x7ffff6ec1000, stmt=0x1f94a00, next_e=0x7ffff6e57d90,
bb_pbb_mapping=..., level=0, ip=0x7fffffffd890) at
../../gcc/graphite-clast-to-gimple.c:1220
#21 0x00000000012971c0 in translate_clast
(context_loop=0x7ffff6ec1000, stmt=0x1f94a00, next_e=0x7ffff6e57d90,
bb_pbb_mapping=..., level=0, ip=0x7fffffffd890) at
../../gcc/graphite-clast-to-gimple.c:1307
#22 0x00000000012972b4 in translate_clast
(context_loop=0x7ffff6ec1000, stmt=0x1f7b180, next_e=0x7ffff6e57d90,
bb_pbb_mapping=..., level=0, ip=0x7fffffffd890) at
../../gcc/graphite-clast-to-gimple.c:1327
#23 0x0000000001297dee in gloog (scop=0x1f73cc0, bb_pbb_mapping=...)
at ../../gcc/graphite-clast-to-gimple.c:1705
#24 0x0000000001293dbb in graphite_transform_loops () at
../../gcc/graphite.c:309
#25 0x0000000001293e50 in graphite_transforms () at ../../gcc/graphite.c:345
...
Graphite is building a new loop guard. For this purpose it needs to
convert the clast expression with the Graphite-optimized code back
into Gimple (fr #18, clast_to_gcc_expression())
In frame #13 it is making an expression which is the product of
"((signed long) maxLen_5(D) - (signed long) minLen_3(D)) /[fl]
4294967296" and "-4294967296" using the constant sub-tree folding
facilities. The result should be "(((signed long) maxLen_5(D) -
(signed long) minLen_3(D)) /[fl] 4294967296) * -4294967296" which is
the correct expression that stand-alone cloog produced on our clast
data.
However, in frame #10 you can see that folding tries to optimize the
expression along the way by negating both of its parts (ok, no problem
yet):
/* Transform x * -C into -x * C if x is easily negatable. */
if (TREE_CODE (arg1) == INTEGER_CST
&& tree_int_cst_sgn (arg1) == -1
&& negate_expr_p (arg0)
&& (tem = negate_expr (arg1)) != arg1
&& !TREE_OVERFLOW (tem))
return fold_build2_loc (loc, MULT_EXPR, type,
fold_convert_loc (loc, type,
negate_expr (arg0)),
tem);
In frame 9 (negate_expr) it tries to find a simplified negation for
"((signed long) maxLen_5(D) - (signed long) minLen_3(D)) /[fl]
4294967296"
In frame 8 (fold_negate_expr) it finally picks the "appropriate"
negation. Which is (gasp)
"((signed long) maxLen_5(D) - (signed long) minLen_3(D)) /[fl] -4294967296".
This negation looks WRONG to me, since, as Mircea noted,
(1/[fl]4294967296)*-4294967296 = 0 BUT (1/[fl]-4294967296)*4294967296
= -1.
Looking at the code in fold_negate_expr() you can see that only
problems with overflow are taken into consideration for /[fl], and our
case where its denominator changes the sign is NOT checked:
case TRUNC_DIV_EXPR:
case ROUND_DIV_EXPR:
case FLOOR_DIV_EXPR:
case CEIL_DIV_EXPR:
case EXACT_DIV_EXPR:
/* In general we can't negate A / B, because if A is INT_MIN and
B is 1, we may turn this into INT_MIN / -1 which is undefined
and actually traps on some architectures. But if overflow is
undefined, we can negate, because - (INT_MIN / 1) is an
overflow. */
if (!INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_UNDEFINED (type))
{
const char * const warnmsg = G_("assuming signed overflow does not "
"occur when negating a division");
tem = TREE_OPERAND (t, 1);
if (negate_expr_p (tem))
{
if (INTEGRAL_TYPE_P (type)
&& (TREE_CODE (tem) != INTEGER_CST
|| integer_onep (tem)))
fold_overflow_warning (warnmsg, WARN_STRICT_OVERFLOW_MISC);
return fold_build2_loc (loc, TREE_CODE (t), type, // <==
yep, we execute this!
TREE_OPERAND (t, 0), negate_expr (tem));
}
The same logical fallacy can be seen in negate_expr_p() which checks
if an expression should be negated:
case TRUNC_DIV_EXPR:
case ROUND_DIV_EXPR:
case FLOOR_DIV_EXPR:
case CEIL_DIV_EXPR:
case EXACT_DIV_EXPR:
/* In general we can't negate A / B, because if A is INT_MIN and
B is 1, we may turn this into INT_MIN / -1 which is undefined
and actually traps on some architectures. But if overflow is
undefined, we can negate, because - (INT_MIN / 1) is an
overflow. */
if (INTEGRAL_TYPE_P (TREE_TYPE (t)))
{
if (!TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (t)))
break;
/* If overflow is undefined then we have to be careful because
we ask whether it's ok to associate the negate with the
division which is not ok for example for
-((a - b) / c) where (-(a - b)) / c may invoke undefined
overflow because of negating INT_MIN. So do not use
negate_expr_p here but open-code the two important cases. */
if (TREE_CODE (TREE_OPERAND (t, 0)) == NEGATE_EXPR
|| (TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST
&& may_negate_without_overflow_p (TREE_OPERAND (t, 0))))
return true;
}
I commented the line "case FLOOR_DIV_EXPR:" in negate_expr_p(), thus,
disabling incorrect negation optimization for /[fl]. And, rightfully
so, the test is now giving the correct result.
Bug in constant sub-tree folding?
Peace,
--
Vladimir