[llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics

41 views
Skip to first unread message

Amara Emerson via llvm-dev

unread,
Apr 8, 2020, 1:00:04 AM4/8/20
to LLVM Developers' List
Hi,

It’s been a few years now since I added some intrinsics for doing vector reductions. We’ve been using them exclusively on AArch64, and I’ve seen some traffic a while ago on list for other targets too. Sander did some work last year to refine the semantics after some discussion.

Are we at the point where we can drop the “experimental” from the name? IMO all target should begin to transition to using these as the preferred representation for reductions. But for now, I’m only proposing the naming change.

Cheers,
Amara
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chris Lattner via llvm-dev

unread,
Apr 8, 2020, 12:24:32 PM4/8/20
to Amara Emerson, LLVM Developers' List
Hi Amara,

Can you provide a pointer to the docs that describe these?

-Chris

Amara Emerson via llvm-dev

unread,
Apr 8, 2020, 12:47:54 PM4/8/20
to Chris Lattner, LLVM Developers' List

Nikita Popov via llvm-dev

unread,
Apr 8, 2020, 12:55:24 PM4/8/20
to Amara Emerson, LLVM Developers' List
On Wed, Apr 8, 2020 at 6:59 AM Amara Emerson via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi,

It’s been a few years now since I added some intrinsics for doing vector reductions. We’ve been using them exclusively on AArch64, and I’ve seen some traffic a while ago on list for other targets too. Sander did some work last year to refine the semantics after some discussion.

Are we at the point where we can drop the “experimental” from the name? IMO all target should begin to transition to using these as the preferred representation for reductions. But for now, I’m only proposing the naming change.

There's still a couple of open issues that I'm aware of:

1. fmin/fmax reductions without nnan flag do not work. IR expansion code assumes that these always use FMF. It's also under-documented what their exact behavior is, though I assume it should match llvm.minnum/llvm.maxnum semantics to be most useful.

2. SDAG legalization support for float softening is missing.

3. SDAG legalization for ordered reductions is missing.

I think point 1 is the only real blocker here, the rest can be avoided by force-expanding these cases in IR.

Regards,
Nikita

Amara Emerson via llvm-dev

unread,
Apr 8, 2020, 3:40:27 PM4/8/20
to Nikita Popov, LLVM Developers' List

On Apr 8, 2020, at 9:54 AM, Nikita Popov <nikit...@gmail.com> wrote:

On Wed, Apr 8, 2020 at 6:59 AM Amara Emerson via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi,

It’s been a few years now since I added some intrinsics for doing vector reductions. We’ve been using them exclusively on AArch64, and I’ve seen some traffic a while ago on list for other targets too. Sander did some work last year to refine the semantics after some discussion.

Are we at the point where we can drop the “experimental” from the name? IMO all target should begin to transition to using these as the preferred representation for reductions. But for now, I’m only proposing the naming change.

There's still a couple of open issues that I'm aware of:

1. fmin/fmax reductions without nnan flag do not work. IR expansion code assumes that these always use FMF. It's also under-documented what their exact behavior is, though I assume it should match llvm.minnum/llvm.maxnum semantics to be most useful.
I think without the nnan flag, the semantics should be equivalent to an ordered (i.e. serialized) reduction of llvm.fmin/fmaxnum. Any issues with this?

Philip Reames via llvm-dev

unread,
Apr 9, 2020, 1:04:58 PM4/9/20
to Amara Emerson, LLVM Developers' List
My experience with them so far is that the code generation for these
intrinsics is still missing a lot of cases.  Some of them are X86
specific (the target I look at mostly), but many of them have generic forms.

As one recent example, consider
https://bugs.llvm.org/show_bug.cgi?id=45378.  (There's nothing special
about this one other than it was recent.)

I'm not necessarily arguing they can't be promoted from experimental,
but it would be a much easier case if the code gen was routinely as good
or better than the scalar forms.  Or to say that a bit differently, if
we could canonicalize to them in the IR without major regression. 
Having two ways to represent something in the IR without any agreed upon
canonical form is always sub-optimal.

Philip

Craig Topper via llvm-dev

unread,
Apr 9, 2020, 1:17:45 PM4/9/20
to Philip Reames, LLVM Developers' List
That recent X86 bug isn't unique to the intrinsic. We generate the same code from this which uses the shuffle sequence the vectorizers generated before the reduction intrinsics existed.

declare i64 @llvm.experimental.vector.reduce.or.v2i64(<2 x i64>)·
declare void @TrapFunc(i64)

define void @parseHeaders(i64 * %ptr) {
  %vptr = bitcast i64 * %ptr to <2 x i64> *
  %vload = load <2 x i64>, <2 x i64> * %vptr, align 8

  %b = shufflevector <2 x i64> %vload, <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
  %c = or <2 x i64> %vload, %b
  %vreduce = extractelement <2 x i64> %c, i32 0

  %vcheck = icmp eq i64 %vreduce, 0
  br i1 %vcheck, label %ret, label %trap
trap:
  %v2 = extractelement <2 x i64> %vload, i32 1
  call void @TrapFunc(i64 %v2)
  ret void
ret:
  ret void
}

~Craig

Amara Emerson via llvm-dev

unread,
Apr 9, 2020, 1:21:33 PM4/9/20
to Craig Topper, LLVM Developers' List
Has x86 switched to the intrinsics now?

Craig Topper via llvm-dev

unread,
Apr 9, 2020, 1:29:31 PM4/9/20
to Amara Emerson, LLVM Developers' List
No we still use the shuffle expansion which is why the issue isn't unique to the intrinsic.

~Craig

David Green via llvm-dev

unread,
Apr 15, 2020, 8:30:13 AM4/15/20
to Amara Emerson, Nikita Popov, LLVM Developers' List, nd
Hello

We (ARM / MVE) would also at some point want to add the concept of masked reductions. This could either be done by extending the existing intrinsics much like how a masked gather/scatter works, or with an entirely new set of intrinsics. I'm not sure which way is better in the long run, but the masked variants probably have a superset of the functionality of the non-masked ones, considering the mask can be all ones.

I believe other vector architectures such as SVE and the VE target might make use of these too, but perhaps with slightly different semantics around active vector lengths.
Dave

From: llvm-dev <llvm-dev...@lists.llvm.org> on behalf of Nikita Popov via llvm-dev <llvm...@lists.llvm.org>

Sent: 08 April 2020 17:54

To: Amara Emerson <aeme...@apple.com>

Cc: LLVM Developers' List <llvm...@lists.llvm.org>

Subject: Re: [llvm-dev] RFC: Promoting experimental reduction intrinsics to first class intrinsics

 

Hi,

Regards,


Nikita

_______________________________________________

Simon Moll via llvm-dev

unread,
Apr 15, 2020, 9:03:02 AM4/15/20
to David Green, Amara Emerson, Nikita Popov, LLVM Developers' List, nd
Hi,

Yes, we absolutely would want masked reductions. Those are already
planned for VP intrinsics https://reviews.llvm.org/D57504

I'd very much appreciate if we could cooperate on this and implement
masked reduction intrinsics in the llvm.vp.* namespace following the
schema of the VP intrinsics that are already upstream
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).

Btw, i am working on the VP to non-VP expansion pass. If
`llvm.vp.reduce.*` intrinsics should replace the existing ones then
that's the pass where their expansion would be implemented as well.

Cheers
- Simon

Sanjay Patel via llvm-dev

unread,
Jun 16, 2020, 2:39:10 PM6/16/20
to Craig Topper, LLVM Developers' List
We switched over to producing the intrinsics for x86 with:
...I'm not aware of any regressions yet.


So that leaves the problem with fmin/fmax when no fast-math-flags are specified. We need to update the LangRef with whatever the expected behavior is for NaN and -0.0.
x86 will probably be poor regardless of whether we choose "llvm.maxnum" or "llvm.maximum" semantics.

Simon Pilgrim via llvm-dev

unread,
Jun 17, 2020, 8:53:35 AM6/17/20
to llvm...@lists.llvm.org

A minor point, but I think we need to more explicitly describe the order of floating point operations in the LangRef as well:

"If the intrinsic call has the ‘reassoc’ or ‘fast’ flags set, then the reduction will not preserve the associativity of an equivalent scalarized counterpart. Otherwise the reduction will be ordered, thus implying that the operation respects the associativity of a scalarized reduction."

Please could we add some pseudocode to show exactly how the intrinsic will be re-expanded for ordered cases?

Amara Emerson via llvm-dev

unread,
Jun 17, 2020, 2:15:52 PM6/17/20
to Simon Pilgrim, llvm...@lists.llvm.org
Proposed clarification here: https://reviews.llvm.org/D82034

Sanjay Patel via llvm-dev

unread,
Sep 9, 2020, 12:37:49 PM9/9/20
to Amara Emerson, llvm-dev
Proposal to specify semantics for the FP min/max reductions:
I'm not sure how we got to the current state of codegen for those, but it doesn't seem consistent or correct as-is, so I've proposed updates there too.

Amara Emerson via llvm-dev

unread,
Oct 2, 2020, 2:00:00 PM10/2/20
to Sanjay Patel, Craig Topper, llvm-dev
[re-sending to list with new email]

Ping. As far as I can tell the remaining issues are solved?

If so, we’ll need to duplicate the intrinsics without the “experimental”, and add support in the IR autoupgrader and fix up the tests.

Amara

Nikita Popov via llvm-dev

unread,
Oct 2, 2020, 2:35:53 PM10/2/20
to Amara Emerson, llvm-dev
On Fri, Oct 2, 2020 at 8:00 PM Amara Emerson via llvm-dev <llvm...@lists.llvm.org> wrote:
[re-sending to list with new email]

Ping. As far as I can tell the remaining issues are solved?

If so, we’ll need to duplicate the intrinsics without the “experimental”, and add support in the IR autoupgrader and fix up the tests.

Amara

All of my concerns have been addressed, so +1 on going ahead with the rename.

Regards,
Nikita

Amara Emerson via llvm-dev

unread,
Oct 2, 2020, 10:00:21 PM10/2/20
to Sanjay Patel, Craig Topper, llvm-dev
Ping. As far as I can tell the remaining issues are solved?

If so, we’ll need to duplicate the intrinsics without the “experimental”, and add support in the IR autoupgrader and fix up the tests.

Amara

On Sep 9, 2020, at 9:37 AM, Sanjay Patel <spa...@rotateright.com> wrote:

Amara Emerson via llvm-dev

unread,
Oct 3, 2020, 3:29:50 PM10/3/20
to Sanjay Patel, Craig Topper, Nikita Popov, Chris Lattner, llvm-dev
Sorry, this must have been stuck in the ML pipes and so got sent twice.

Anyway,  patch to rename, update tests and auto upgrade old intrinsics here: https://reviews.llvm.org/D88787

Thanks,
Amara
Reply all
Reply to author
Forward
0 new messages