[LLVMdev] Question about FMA formation

63 views
Skip to first unread message

Shuxin Yang

unread,
Dec 12, 2012, 6:40:33 PM12/12/12
to llv...@cs.uiuc.edu
Hi, Dear All:

I'm going implement FMA formation. On some architectures, "FMA a, b,
c" is more precise than
"a * b + c". I'm wondering if FMA could be less precise. In the former
case, can we enable FMA
formation despite restrictive FP mode?

Thanks
Shuxin
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Eli Friedman

unread,
Dec 12, 2012, 6:59:02 PM12/12/12
to Shuxin Yang, llv...@cs.uiuc.edu
On Wed, Dec 12, 2012 at 3:40 PM, Shuxin Yang <shuxi...@gmail.com> wrote:
> Hi, Dear All:
>
> I'm going implement FMA formation. On some architectures, "FMA a, b, c"
> is more precise than
> "a * b + c".

If it isn't more accurate, it isn't an FMA, at least not in the
commonly used sense. (ARM has an instruction which does a multiply
and add which isn't more precise, but it would just be confusing to
refer to that as an FMA.)

> In the former
> case, can we enable FMA
> formation despite restrictive FP mode?

No. There have already been very long discussions about fma; try
searching the llvmdev archives.

-Eli

Michael Ilseman

unread,
Dec 12, 2012, 7:11:29 PM12/12/12
to Shuxin Yang, llv...@cs.uiuc.edu
On Dec 12, 2012, at 3:40 PM, Shuxin Yang <shuxi...@gmail.com> wrote:

Hi, Dear All:

  I'm going implement FMA formation. On some architectures, "FMA a, b, c" is more precise than
"a * b + c".  I'm wondering if FMA could be less precise. In the former case, can we enable FMA
formation despite restrictive FP mode?


I believe that a pass to form fmuladd[1] intrinsic calls would be very useful! The fmuladd intrinsic is defined such that its formation should be isolated from worries about strictness. It simply means "a * b + c" and leaves the decision of whether or not to fuse up to the code generator. Of course, one probably would only run your pass if they wanted the code generator to fuse it, but the pass itself should be valid. 

Someone please correct me if I misunderstand this intrinsic. 

Lang Hames

unread,
Dec 12, 2012, 7:43:26 PM12/12/12
to Michael Ilseman, LLVM Developers Mailing List
A little background:

The fmuladd intrinsic was introduced to support the FP_CONTRACT pragma in C. llvm.fmuladd.* is generated by clang when it sees an expression of the form  'a * b + c' within a single source statement.

If you want to opportunistically form FMA target instructions my inclination would be to skip llvm.fmuladd.* and just form them from a*b+c expressions at isel time. I don't see any fundamental problem with forming llvm.fmuladd.* to model FMA formation opportunities in an IR pass though.

- Lang.

Shuxin Yang

unread,
Dec 12, 2012, 7:49:35 PM12/12/12
to Lang Hames, LLVM Developers Mailing List
Hi, Eli, Mike and Lang:

   Thank you all for the input. This is one e.g which might be difficult for isel:
  a*b + c*d + e => a*b + (c*d + e).

Thanks
Shuxin

Michael Ilseman

unread,
Dec 12, 2012, 7:51:53 PM12/12/12
to Lang Hames, LLVM Developers Mailing List
On Dec 12, 2012, at 4:43 PM, Lang Hames <lha...@gmail.com> wrote:

A little background:

The fmuladd intrinsic was introduced to support the FP_CONTRACT pragma in C. llvm.fmuladd.* is generated by clang when it sees an expression of the form  'a * b + c' within a single source statement.

If you want to opportunistically form FMA target instructions my inclination would be to skip llvm.fmuladd.* and just form them from a*b+c expressions at isel time. I don't see any fundamental problem with forming llvm.fmuladd.* to model FMA formation opportunities in an IR pass though.


I see. Shuxin, do you know if it's pretty simple to match FMA style patterns? Is there any advantage to forming them in the IR, e.g. does it allow you to do a post-pass combining or optimization?

One major user of FMA formation at the IR level is fast-isel, which could just match those patterns itself if they're simple enough and there's not much subsequent optimization to be had.

Michael Ilseman

unread,
Dec 12, 2012, 7:54:15 PM12/12/12
to Shuxin Yang, LLVM Developers Mailing List
On Dec 12, 2012, at 4:49 PM, Shuxin Yang <shuxi...@gmail.com> wrote:

Hi, Eli, Mike and Lang:

   Thank you all for the input. This is one e.g which might be difficult for isel:
  a*b + c*d + e => a*b + (c*d + e).


You hit send right when I did!
For your example, do you mean that it's grouped like:
(fadd (fadd (fmul a b) (fmul c d)) e)

How would your pass go about handling these patterns and is that something that would be too complicated for fast-isel to do on the fly?

Eric Christopher

unread,
Dec 12, 2012, 8:11:21 PM12/12/12
to Michael Ilseman, LLVM Developers Mailing List



You hit send right when I did!
For your example, do you mean that it's grouped like:
(fadd (fadd (fmul a b) (fmul c d)) e)

How would your pass go about handling these patterns and is that something that would be too complicated for fast-isel to do on the fly?


Depends on how they're grouped, but if the formation happens prior to codegen then fast-isel will just handle whatever new instruction you've got. An example of IR would be useful though :)

-eric

Michael Ilseman

unread,
Dec 12, 2012, 8:14:54 PM12/12/12
to Eric Christopher, LLVM Developers Mailing List
Right now we're shying towards having a re-association helper in codegen-prepare that will re-associate expressions (if allowed). This would allow fast-isel to more easily spot FMA opportunities, and form better code.

Eric Christopher

unread,
Dec 12, 2012, 8:20:17 PM12/12/12
to Michael Ilseman, LLVM Developers Mailing List
Why not just form them via a fast IR level pass and just have patterns match in fast isel instead of trying to form code? Or are we saying the same thing? (Your words of "fast isel spot"ting and "form better code" caused me to think you mean to do optimizations within the fast isel pass).

-eric

Michael Ilseman

unread,
Dec 12, 2012, 11:14:10 PM12/12/12
to Eric Christopher, LLVM Developers Mailing List


On Dec 12, 2012, at 5:20 PM, Eric Christopher <echr...@gmail.com> wrote:

Why not just form them via a fast IR level pass and just have patterns match in fast isel instead of trying to form code? Or are we saying the same thing? (Your words of "fast isel spot"ting and "form better code" caused me to think you mean to do optimizations within the fast isel pass).


Sorry, we've kind of been jumping around a bit. I'll try to expound on what's being debated: We have a few options ahead of us as far as benefitting fast-isel is concerned.

We can write a pass to form fmuladds. The intent being to run this very late, perhaps before or part of codegen prepare. The downside here is that it somewhat goes against the point of fast-isel. Fast-isel allows us to skip extra representations of the program, and replacing IR with intrinsic calls is similar to having an extra representation, albeit only for part of the program.

However, the basic task of spotting an fadd of an fmul is simple enough that fast-isel could just emit the FMA equivalent if it likes. This has the benefit that we avoid the extra representation, but the downside that it makes fast-isel a little more complicated and it only does simple patterns. 

Shuxin was showing some more complicated patterns that required re-association to match (fast-math flags permitting). For those, we're considering if having a re-associate-for-FMA functionality in codegen-prepare would solve that problem. Thus, we can re-associate in codegen-prepare and emit FMA in fast-isel.

Lang Hames

unread,
Dec 12, 2012, 11:34:14 PM12/12/12
to Michael Ilseman, LLVM Developers Mailing List
Hi Michael, Shuxin,
 
Shuxin was showing some more complicated patterns that required re-association to match (fast-math flags permitting). For those, we're considering if having a re-associate-for-FMA functionality in codegen-prepare would solve that problem. Thus, we can re-associate in codegen-prepare and emit FMA in fast-isel.

Yep. I misread the association on Shuxin's example, but even ((a*b) + (c*d)) + e would match to a 3-instructions:
(fadd (fma a b (fmul c d)) e).

If there are hairier examples that really require reassociation my vote would be for this last scheme: An FMA-friendly reassociation pass run before isel that exposes simple patterns for isel to match.
 
- Lang.

Eric Christopher

unread,
Dec 13, 2012, 1:24:31 AM12/13/12
to Lang Hames, LLVM Developers Mailing List
Agreed. I don't think fast isel should be attempting to form any new patterns.

-eric 
Reply all
Reply to author
Forward
0 new messages