Hi all,
I've been experimenting with SSA rules for fused multiply-add operations on s390x and I'm wondering what people's opinions on fused multiply-adds are. I found some discussion here:
https://groups.google.com/d/topic/golang-dev/qvOqcmAkKnA/discussion and the consensus seems to be that fused multiply-add extraction should be allowed provided it is blocked by an explicit cast or assignment (for example, fused multiply-adds could be used for
x*y±z, but not for
float64(x*y)±z or
t:=x*y;t+=z). I think we'd need to add a way for the SSA backend to explicitly block the optimization in these cases (I haven't done this yet).
Up until now I hadn't really considered using them because of the changes they might make to program output, but I'm wondering if the benefits (improved accuracy, faster operation) outweigh such costs.
My prototype passes ./all.bash and I haven't found any evidence yet that the tests would need to be modified to accommodate this optimization.
Any thoughts? I'd be happy to attempt to formalize the rules in a proposal for 1.9. I think the only architectures with fused multiply-add instructions in the base-level instruction sets are arm64, ppc64 and s390x.
Thanks,
Michael