Sorry, we've kind of been jumping around a bit. I'll try to expound on what's being debated: We have a few options ahead of us as far as benefitting fast-isel is concerned.
We can write a pass to form fmuladds. The intent being to run this very late, perhaps before or part of codegen prepare. The downside here is that it somewhat goes against the point of fast-isel. Fast-isel allows us to skip extra representations of the program, and replacing IR with intrinsic calls is similar to having an extra representation, albeit only for part of the program.
However, the basic task of spotting an fadd of an fmul is simple enough that fast-isel could just emit the FMA equivalent if it likes. This has the benefit that we avoid the extra representation, but the downside that it makes fast-isel a little more complicated and it only does simple patterns.
Shuxin was showing some more complicated patterns that required re-association to match (fast-math flags permitting). For those, we're considering if having a re-associate-for-FMA functionality in codegen-prepare would solve that problem. Thus, we can re-associate in codegen-prepare and emit FMA in fast-isel.