FMA instruction support in swiftshader's llvm

41 views

Skip to first unread message

Shalini Salomi Bodapati

unread,

Oct 23, 2020, 5:26:10 AM10/23/20

to swiftshader

Hi All,

Currently Fused Multiply Add in Shader Core is implemented by multiplication followed by addition.

	void ShaderCore::mad(Vector4f &dst, const Vector4f &src0, const Vector4f &src1, const Vector4f &src2)
692	{
693		dst.x = src0.x * src1.x + src2.x;
694		dst.y = src0.y * src1.y + src2.y;
695		dst.z = src0.z * src1.z + src2.z;
696		dst.w = src0.w * src1.w + src2.w;
697	}
698

I would want to use vfmadd instructions to achieve this. But I see that neither llvm 7.0 nor llvm 10.0 is providing

intrinsic for this. (llvm::Intrinsic::x86_fma_vfmadd_ps is not avalible in swiftshader/third_party/llvm-7.0/llvm/include/llvm/IR$IntrinsicsX86.td)

Can anyone please help me on how to use fma instructions for mad without the llvm Intrinsic ?

Thanks in Advance!

Ben Clayton

unread,

Oct 23, 2020, 6:51:25 AM10/23/20

to Shalini Salomi Bodapati, swiftshader

Hi Shalini,

> I would want to use vfmadd instructions to achieve this. But I see that neither llvm 7.0 nor llvm 10.0 is providing intrinsic for this.

There is the llvm.fma.* set of IR intrinsics, and you'll find the llvm::Intrinsic::fma enumerator declared via the llvm/IR/IntrinsicEnums.inc file.

That said, it appears that LLVM can automatically transform a vector multiply and add into a FMA, so long as fast-math is enabled and the target supports the necessary instructions:

https://godbolt.org/z/8aoqE6

clang10 -g0 -O2 -ffast-math -march=skylake

typedef float vec4 __attribute__((ext_vector_type(4)));

vec4 fma(vec4 in[]) {

vec4 a = in[0];

vec4 b = in[1];

vec4 c = in[2];

return a * b + c;

}

fma(float __vector(4)*):

vmovaps xmm1, xmmword ptr [rdi]

vmovaps xmm0, xmmword ptr [rdi + 16]

vfmadd213ps xmm0, xmm1, xmmword ptr [rdi + 32]

ret

We prove that the FMA optimization is done in the backend, as the IR still contains a mul and add:

https://godbolt.org/z/71M6vx

clang10 -emit-llvm -g0 -O2 -ffast-math -march=skylake

define dso_local <4 x float> @_Z3fmaPDv4_f(<4 x float>* nocapture readonly %0) local_unnamed_addr #0 {

%2 = load <4 x float>, <4 x float>* %0, align 16, !tbaa !2

%3 = getelementptr inbounds <4 x float>, <4 x float>* %0, i64 1

%4 = load <4 x float>, <4 x float>* %3, align 16, !tbaa !2

%5 = getelementptr inbounds <4 x float>, <4 x float>* %0, i64 2

%6 = load <4 x float>, <4 x float>* %5, align 16, !tbaa !2

%7 = fmul fast <4 x float> %4, %2

%8 = fadd fast <4 x float> %7, %6

ret <4 x float> %8

}

Which compiles down to the same thing:

https://godbolt.org/z/ncvc91

clang10 -x ir -O2

fma(float __vector(4)*):

vmovaps xmm1, xmmword ptr [rdi]

vmovaps xmm0, xmmword ptr [rdi + 16]

vfmadd213ps xmm0, xmm1, xmmword ptr [rdi + 32]

ret

Assuming that the call to detectHost() is detecting your system's support for FMA instructions, I guess the other reason you're not getting FMAs is the lack of fast-math?

It appears that certain IR instructions can be annotated with fast-math.

Maybe as an experiment you could enable these for those instructions in ShaderCore::mad() and see if that gives you the expected output?

Be aware that FMAs will likely give you subtly different end results. Your mileage may vary.

Cheers,

Ben

--
You received this message because you are subscribed to the Google Groups "swiftshader" group.
To unsubscribe from this group and stop receiving emails from it, send an email to swiftshader...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/swiftshader/e329f01c-4b79-446c-bd91-fb0bbd905b45o%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages