[llvm-dev] RFC: (Co-)Convergent functions and uniform function parameters

61 views
Skip to first unread message

Nicolai Hähnle via llvm-dev

unread,
Oct 24, 2016, 3:38:55 PM10/24/16
to LLVM Dev, Marek Olšák
Hi all,

Some brain-storming on an issue with SPMD/SIMT backend support where I
think some additional IR attributes would be useful. Sorry for the
somewhat long mail; the short version of my current thinking is that I
would like to have the following:

1) convergent: a call to a function with this attribute cannot be moved
to have additional control dependencies; i.e., moving it from A to B is
only possible if B dominates or post-dominates A.

2) co-convergent (divergent? for lack of a better name...): a call to a
function with this attribute cannot be moved to have _fewer_ control
dependencies; i.e., moving it from A to B is only possible if A
dominates or post-dominates B.

3) uniform (for function arguments): transformations are not allowed to
introduce additional non-uniformity in this argument.

I'd appreciate input on this proposal, e.g. if this can be solved in an
easier way or if there are obvious problems with this approach.

Thanks,
Nicolai

..........
In a nutshell, the problem that this intends to solve is that:

%v1 = texelFetch(%sampler, %coord0)
%v2 = texelFetch(%sampler, %coord1)
%v = select i1 %cond, vType %v1, %v2

is logically equivalent to and could benefit from being transformed to,

%coord = select i1 %cond, cType %coord0, %coord1
%v = texelFetch(%sampler, %coord)

but on the other hand

%v1 = texelFetch(%sampler0, %coord)
%v2 = texelFetch(%sampler1, %coord)
%v = select i1 %cond, vType %v1, %v2

_must_not_ be transformed to

%s = select i1 %cond, sType %sampler0, %sampler1
%v = texelFetch(%s, %coord)

because of uniformity restrictions on the first argument of texelFetch.

We currently have a shader that is mis-compiled in the wild because
something much like the latter transform is done by SimplifyCFG.[1]
There, the equivalent thing happens with phi nodes:

if:
%v.if = texelFetch(%sampler0, %coord)
br label %end

else:
%v.else = texelFetch(%sampler1, %coord)
br label %end

end:
%v = phi [ %v.if, %if ], [ %v.else, %else ]

becomes

...
end:
%s = phi [ %sampler0, %if ], [ %sampler1, %else ]
%v = texelFetch(%s, %coord)

I have collected some more examples at [2]

The distinctions between all three attribute types mentioned above makes
sense, because there are OpenGL shader intrinsics that naturally carry
almost all of the possible combinations of those attributes.

That said, a pass that is not SPMD/SIMT-aware must treat every function
call that has a uniform argument as if it were co-convergent, because
the input to the uniform argument could have non-uniformity whose
structure correlates with control dependencies; see [2] for an example.
Furthermore, I have not yet found an operation that needs the
co-convergent attribute without also having uniform arguments. So for
the purposes of LLVM, it may be sufficient to add the 'uniform'
attribute for function arguments.


[1] https://bugs.freedesktop.org/show_bug.cgi?id=97988
[2]
http://nhaehnle.blogspot.de/2016/10/compiling-shaders-dynamically-uniform.html
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mehdi Amini via llvm-dev

unread,
Oct 24, 2016, 3:54:50 PM10/24/16
to Nicolai Hähnle, LLVM Dev, Marek Olšák

> On Oct 24, 2016, at 12:38 PM, Nicolai Hähnle via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Hi all,
>
> Some brain-storming on an issue with SPMD/SIMT backend support where I think some additional IR attributes would be useful. Sorry for the somewhat long mail; the short version of my current thinking is that I would like to have the following:
>
> 1) convergent: a call to a function with this attribute cannot be moved to have additional control dependencies; i.e., moving it from A to B is only possible if B dominates or post-dominates A.
>
> 2) co-convergent (divergent? for lack of a better name...): a call to a function with this attribute cannot be moved to have _fewer_ control dependencies; i.e., moving it from A to B is only possible if A dominates or post-dominates B.
>
> 3) uniform (for function arguments): transformations are not allowed to introduce additional non-uniformity in this argument.

Can you describe it in terms that are non-SPMD/SIMT?
I.e. I’m not sure that “uniformity” refers to an existing LLVM IR concept.

Thanks,


Mehdi

Nicolai Hähnle via llvm-dev

unread,
Oct 24, 2016, 7:11:46 PM10/24/16
to Mehdi Amini, LLVM Dev, Marek Olšák
On 24.10.2016 21:54, Mehdi Amini wrote:
>> On Oct 24, 2016, at 12:38 PM, Nicolai Hähnle via llvm-dev <llvm...@lists.llvm.org> wrote:
>> Some brain-storming on an issue with SPMD/SIMT backend support where I think some additional IR attributes would be useful. Sorry for the somewhat long mail; the short version of my current thinking is that I would like to have the following:
>>
>> 1) convergent: a call to a function with this attribute cannot be moved to have additional control dependencies; i.e., moving it from A to B is only possible if B dominates or post-dominates A.
>>
>> 2) co-convergent (divergent? for lack of a better name...): a call to a function with this attribute cannot be moved to have _fewer_ control dependencies; i.e., moving it from A to B is only possible if A dominates or post-dominates B.
>>
>> 3) uniform (for function arguments): transformations are not allowed to introduce additional non-uniformity in this argument.
>
> Can you describe it in terms that are non-SPMD/SIMT?
> I.e. I’m not sure that “uniformity” refers to an existing LLVM IR concept.

Yeah, that's actually the key problem I've been struggling with.

The first example I sent shows the gist of it. It also shows that the
concept can't be expressed in terms of the CFG, which makes this tricky.

In a way it's the data-flow analog of the convergent attribute: the
argument cannot be changed to have additional dependencies in its
computation. That captures the basic intention, but I'm not completely
sure that it works.

Thanks,
Nicolai

Nicolai Hähnle via llvm-dev

unread,
Oct 24, 2016, 7:15:53 PM10/24/16
to Mehdi Amini, LLVM Dev, Marek Olšák

One big question is whether there are transformations in LLVM today that
replace a value %a with a value %b, where %b "has additional
dependencies in its computation". I can't think of anything obvious
where that would be the case, but I'm not sure.

Nicolai

Mehdi Amini via llvm-dev

unread,
Oct 24, 2016, 7:17:27 PM10/24/16
to Nicolai Hähnle, LLVM Dev, Marek Olšák

What do you mean by “additional dependencies in its computation" here?


Nicolai Hähnle via llvm-dev

unread,
Oct 25, 2016, 10:29:00 AM10/25/16
to Mehdi Amini, LLVM Dev, Marek Olšák

Minus phi-nodes, loads, and intrinsic calls, it could be: look at the
directed graph where %a -> %b if %a is an operand of the instruction
that defines %b. Take all nodes of that graph that do not have predecessors.

But I fear that this path leads to eternal fuzziness. Let me try a
completely different approach to define what we need by augmenting the
semantics of IR with "divergence tokens". In addition to its usual
value, every IR value carries a "divergence set" of divergence tokens.

The basic rule is: the divergence set of a value is (at least) the union
of the divergence sets of its operands.

Every function input carries a unique divergence token.

For phi-nodes, the divergence set also includes the branch conditions
that can affect which predecessor value is taken. (A simpler, more
conservative rule, would be to add a unique divergence token; this can
be per-basic block.)

Loads add a unique divergence token. (This is a very conservative rule
mostly required for per-thread-allocas. It could be relaxed quite a bit
by treating per-thread-memory like SSA-values augmented with divergence
sets.)

Atomic operations add a unique divergence token.

Additional target-specific intrinsics may also add a divergence token.

By transitivity, most function calls have to add a unique divergence token.

The first restriction on the transformation of a call to a function with
'uniform' function argument is then:

1. The corresponding argument of the call can be changed only if the
change preserves (or shrinks) the divergence set (on top of that, the
new value obviously has to be equal to the old one in single threaded
semantics).

With this definition,

def @fn(s0, s1, cond) {
v0 = texelFetch(s0, ...)
v1 = texelFetch(s1, ...)
v = select i1 cond, v0, v1
}

cannot be transformed to

def @fn(s0, s1, cond) {
s = select i1 cond, s0, s1
v = texelFetch(s, ...)
}

because s has a larger divergence set.

On the other hand, at least the InstCombine transforms should all be
safe because they locally transform a DAG of values in a way that can
only shrink the set of predecessors of the locally transformed region of
the DAG.

Transforms that modify the CFG could be problematic especially with the
simpler phi-node-rule, e.g. when a new basic block is created, you end
up introducing new divergence tokens for "downstream" values. That's
probably not a problem with the first definition for phi-nodes, though
I'm not sure.

.....
The above definition still allows the transformation of

masked &= 1
if (masked == 1) {
v1 = texelFetch(array[masked], ...)
} else {
v0 = texelFetch(array[masked], ...)
}
v = phi(v1, v0)

to

masked &= 1
v = texelFetch(array[masked], ...);

which is incorrect. So for a set of divergence tokens D, let's say A
D-dominates B if the following holds: there exists a way of fixing the
branching directions of conditional branches whose condition's
divergence set is disjoint from D, such that A dominates B in the
resulting CFG. Similarly for D-post-domination.

If D \subset D', then D'-domination implies D-domination.

We can then formulate the second condition as:

2a. A function call with a uniform argument with divergence set D can be
moved from location A to location B if A D-dominates or D-post-dominates B.

2b. Simpler but more conservatively, just treat the function call as
co-convergent (i.e., it can only be moved to positions that are
dominated or post-dominated by its current position in the usual sense.)

.....
The full definition of D-domination is probably too involved to be
turned into practical algorithms. But with this model of divergence
sets, it should be possible to think of DivergenceAnalysis as an
analysis that proves divergence sets of values to be empty (using
additional target specific information to be less conservative).
Combined with DivergenceAnalysis, (1)+(2a) still leave room for some
useful and practical optimizations of functions with uniform arguments
even across basic blocks.

In practical terms for a first version, the effect of the attribute
should just be to prevent the two transformations shown above.

Thanks,
Nicolai

Nicolai Hähnle via llvm-dev

unread,
Oct 26, 2016, 10:55:04 AM10/26/16
to Mehdi Amini, LLVM Dev, Marek Olšák

Turns out that this isn't actually sufficient. Consider this (admittedly
convoluted) example:

idx = input & 1
v0 = texelFetch(s[idx & 0], ...)
v1 = texelFetch(s[idx | 1], ...)
cond = trunc idx to i1
v = select i1 cond, v1, v0

transformed into

idx = in & 1
v = texelFetch(s[idx], ...)

has the same divergence set for the argument to texelFetch as defined in
the quote above, but the transformation is incorrect.

So it's pretty clear what we want: We want a way to flag certain
function arguments in a way that forbids transformations of the form
select(call ... , call ...) -> call (select ..., ...), ...

But it would be nice to have a clear definition of _why_ those
transformations must be forbidden. It's not clear how to do that without
pulling in a full model of SIMT-style parallel execution, and admittedly
I don't think we have a sane model for _that_ in the first place :-(

Something that at least partially addresses the SIMT-style semantics:
For every pair (initial state, function inputs) and every call site of
the relevant function, keep a log of function arguments with which the
function has been called.

The restriction on program transformations is roughly: two pairs A and B
of (initial state, function inputs) are compatible when run on the
original program if at each call site the log of A is a subsequence of
the log of B or vice versa. Then every pair A,B that is compatible when
run on the original program must also be compatible on the transformed
program.

This isn't a great definition either, but it gets closer to the actual
SIMT-semantics. Any additional ideas would be very much appreciated.

Thanks,
Nicolai

Justin Lebar via llvm-dev

unread,
Oct 31, 2016, 2:38:06 PM10/31/16
to Nicolai Hähnle, LLVM Dev, Marek Olšák
(I work on CUDA / PTX.)

For one thing I'm in favor of having fewer annotations rather than
more, so if we can do this in a reasonable way without introducing the
notion of co-convergent calls, I think that would be a win. The one
convergent annotation is difficult enough for the GPU folks to grok
and then keep in cache, and everyone who works on llvm has to pay the
cost of keeping their passes compliant with these annotations.

I will think on a way to formalize this "uniform" attribute -- I don't
have any good ideas at the moment. Are you going to be at the llvm
dev meeting this week? We might be able to make progress
brainstorming this in person.

-Justin

Nicolai Hähnle via llvm-dev

unread,
Oct 31, 2016, 4:49:50 PM10/31/16
to Justin Lebar, LLVM Dev, Marek Olšák
On 31.10.2016 19:37, Justin Lebar wrote:
> (I work on CUDA / PTX.)
>
> For one thing I'm in favor of having fewer annotations rather than
> more, so if we can do this in a reasonable way without introducing the
> notion of co-convergent calls, I think that would be a win. The one
> convergent annotation is difficult enough for the GPU folks to grok
> and then keep in cache, and everyone who works on llvm has to pay the
> cost of keeping their passes compliant with these annotations.

I know. I'd love to find a simpler way to express this constraint, so
any ideas are welcome :)


> I will think on a way to formalize this "uniform" attribute -- I don't
> have any good ideas at the moment. Are you going to be at the llvm
> dev meeting this week? We might be able to make progress
> brainstorming this in person.

Unfortunately I won't be there.

Nicolai

Nicolai Hähnle via llvm-dev

unread,
Nov 7, 2016, 5:30:28 AM11/7/16
to Justin Lebar, LLVM Dev, Marek Olšák
On 31.10.2016 19:37, Justin Lebar wrote:
> For one thing I'm in favor of having fewer annotations rather than
> more, so if we can do this in a reasonable way without introducing the
> notion of co-convergent calls, I think that would be a win. The one
> convergent annotation is difficult enough for the GPU folks to grok
> and then keep in cache, and everyone who works on llvm has to pay the
> cost of keeping their passes compliant with these annotations.

I now put an RFC implementation at https://reviews.llvm.org/D26348. The
commit message contains a brief recap of what this is for.

The LangRef language there has a proposal to help reduce the cognitive
burden: if you think of the convergent attribute for functions as saying
that "whether/when the function is called must be uniform across
threads", then the convergent attribute for function _arguments_ is
saying that "the actual argument that the function is called with must
be uniform across threads".

The consequences for how this restricts transforms are different, but
the mental model is the same and I think effectively correct for both
meanings. (As a minor bonus, the code delta is smaller if the same term
is used for both ;-)). Please correct me if I'm missing something!

The one thing that I'm not quite happy with is that there's no clean
formalization of what the restriction really is. There's an okay-ish
attempt in the LangRef diff of the patch, but to be honest I'd rather
just leave that out if nothing better can be found. The difficulty I'm
grappling with is that this involves both data flow and control flow,
and I don't know how to express the combination properly without
importing the whole semantics of a SIMT/SPMD-style execution model.

Reply all
Reply to author
Forward
0 new messages