ldc SSE intrinsic support

216 views
Skip to first unread message

Mike Farnsworth

unread,
Nov 6, 2009, 6:06:02 PM11/6/09
to LDC - the LLVM D compiler
What support, if any, does ldc have for SSE intrinsics, such as those
found in the gcc/vc++/etc headers *mmintrin.h ? The llvm gcc frontend
supports them, and I believe clang does as well. I could only find a
few pragmas for llvm intrinsics in the ldc part of the std lib.

If this is something that could (relatively) easily be added, let me
know what to do and I'll see if I can muster up a patch sometime. It
may also be that there needs to be a little extra knowledge about some
__m128- and __m64-like types in order to have the llvm backend
actually prefer putting them in SSE registers?

Thanks,
Mike

Christian Kamm

unread,
Nov 7, 2009, 2:49:15 AM11/7/09
to ldc...@googlegroups.com

Hi Mike,

LDC can bind LLVM intrinsics to D functions if their signature is expressable
in D. Like for memcpy:

pragma(intrinsic, "llvm.memcpy.i32")
void llvm_memcpy_i32(void* dst, void* src, uint len, uint alignment);

As far as I remember, the problem with the SSE intrinsics is that there's no D
type that matches llvm's vector types
(http://www.llvm.org/docs/LangRef.html#t_vector). And that means getting
explicit access to the intrinsic from D code will not be easy.

I think Tomas looked into this at some point, maybe he can offer some advice
to get you started.

Note that the LLVM optimizer should already make use of the intrinsics.

Regards,
Christian

signature.asc

Tomas Lindquist Olsen

unread,
Nov 7, 2009, 10:35:25 AM11/7/09
to ldc...@googlegroups.com

The problem was indeed introducing the vector types into the D type system.
Fixed-size array seems like a perfect fit, but in D1 they are treated
as reference types (and are not returnable).
I never found the best way to go about it... and adding a completely
new type into the frontend seemed like a lot of work

Tomas

Mike Farnsworth

unread,
Nov 9, 2009, 2:18:00 AM11/9/09
to LDC - the LLVM D compiler
Just for a bit more info, I replied to someone on the main D newsgroup
about what I've been doing thus far:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=100436

> On Sat, Nov 7, 2009 at 8:49 AM, Christian Kamm <k...@incasoftware.de> wrote:
[snip]
> > As far as I remember, the problem with the SSE intrinsics is that there's no D
> > type that matches llvm's vector types
> > (http://www.llvm.org/docs/LangRef.html#t_vector). And that means getting
> > explicit access to the intrinsic from D code will not be easy.
>
> > I think Tomas looked into this at some point, maybe he can offer some advice
> > to get you started.
>
> > Note that the LLVM optimizer should already make use of the intrinsics.

Yeah, but do the LLVM intrinsics for vector types actually cover all
of the range of Intel intrinsics for SSE? I'll bet not, but probably
the majority of the operations I need are already taken care of by
those LLVM vector intrinsics. I wonder how they forward on those
extra intrinsics in clang and llvm-gcc?

On Nov 7, 7:35 am, Tomas Lindquist Olsen <tomas.l.ol...@gmail.com>
wrote:
> The problem was indeed introducing the vector types into the D type system.
> Fixed-size array seems like a perfect fit, but in D1 they are treated
> as reference types (and are not returnable).
> I never found the best way to go about it... and adding a completely
> new type into the frontend seemed like a lot of work

D2 will pass the static arrays around by value, and they will be
returnable. How hard is it to add a switch to enable just that
behavior in ldc even for D1? I'd imagine mucking with the frontend
like that could be a pain, but Walter does have that recent change
that you can peek at in dmd 2.036 for reference.

What about introducing the types as just a struct with a float[4] in
it? Is there something in the x86-64 ABI that makes it a bad idea to
stuff that in an xmm register on return? Perhaps there's a simple
flag that could be set on the struct type that notes it is intended
for SIMD registers?

I should probably stop making suggestions because I really have no
idea how hard any of that would be, but if you guys want I could make
my own attempt at implementing parts of this in ldc and report back.
I haven't got gobs of free time, but I could kick it around a bit if
you point me to the parts of the code that are likely relevant.

On a philosophical note, I also don't think adding a completely new
type to the frontend is a good idea. I would imagine rather you would
want to (in D2, really):

* Make all static arrays of basic types (byte, char flavors, short
flavors, int flavors, float, double) all be marked automatically as
the underlying LLVM vector types. When returning them by value, LLVM
has its target-dependent limit on how many elements can be in the
vector type, so when it exceeds that just default to the usual (on-the-
stack?) path for return values.

* This part is probably harder, but make sure you can detect structs
composed internally of those same static array types (and no more?),
because then those can get the same treatment.

End result is probably that LLVM codegen will pass a ton of that stuff
around in registers, and when inlining you'll get a bunch of movaps
and movups instructions to go away.

-Mike

Tomas Lindquist Olsen

unread,
Nov 9, 2009, 5:40:43 AM11/9/09
to ldc...@googlegroups.com
On Mon, Nov 9, 2009 at 8:18 AM, Mike Farnsworth
<mike.fa...@gmail.com> wrote:
>
> Just for a bit more info, I replied to someone on the main D newsgroup
> about what I've been doing thus far:
>
> http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=100436
>
>> On Sat, Nov 7, 2009 at 8:49 AM, Christian Kamm <k...@incasoftware.de> wrote:
> [snip]
>> > As far as I remember, the problem with the SSE intrinsics is that there's no D
>> > type that matches llvm's vector types
>> > (http://www.llvm.org/docs/LangRef.html#t_vector). And that means getting
>> > explicit access to the intrinsic from D code will not be easy.
>>
>> > I think Tomas looked into this at some point, maybe he can offer some advice
>> > to get you started.
>>
>> > Note that the LLVM optimizer should already make use of the intrinsics.
>
> Yeah, but do the LLVM intrinsics for vector types actually cover all
> of the range of Intel intrinsics for SSE?  I'll bet not, but probably
> the majority of the operations I need are already taken care of by
> those LLVM vector intrinsics.  I wonder how they forward on those
> extra intrinsics in clang and llvm-gcc?
>

LLVM provides these x86 simd intrinsics by default:

"llvm.x86.mmx.emms",
"llvm.x86.mmx.femms",
"llvm.x86.mmx.maskmovq",
"llvm.x86.mmx.movnt.dq",
"llvm.x86.mmx.packssdw",
"llvm.x86.mmx.packsswb",
"llvm.x86.mmx.packuswb",
"llvm.x86.mmx.padds.b",
"llvm.x86.mmx.padds.w",
"llvm.x86.mmx.paddus.b",
"llvm.x86.mmx.paddus.w",
"llvm.x86.mmx.pavg.b",
"llvm.x86.mmx.pavg.w",
"llvm.x86.mmx.pcmpeq.b",
"llvm.x86.mmx.pcmpeq.d",
"llvm.x86.mmx.pcmpeq.w",
"llvm.x86.mmx.pcmpgt.b",
"llvm.x86.mmx.pcmpgt.d",
"llvm.x86.mmx.pcmpgt.w",
"llvm.x86.mmx.pmadd.wd",
"llvm.x86.mmx.pmaxs.w",
"llvm.x86.mmx.pmaxu.b",
"llvm.x86.mmx.pmins.w",
"llvm.x86.mmx.pminu.b",
"llvm.x86.mmx.pmovmskb",
"llvm.x86.mmx.pmulh.w",
"llvm.x86.mmx.pmulhu.w",
"llvm.x86.mmx.pmulu.dq",
"llvm.x86.mmx.psad.bw",
"llvm.x86.mmx.psll.d",
"llvm.x86.mmx.psll.q",
"llvm.x86.mmx.psll.w",
"llvm.x86.mmx.pslli.d",
"llvm.x86.mmx.pslli.q",
"llvm.x86.mmx.pslli.w",
"llvm.x86.mmx.psra.d",
"llvm.x86.mmx.psra.w",
"llvm.x86.mmx.psrai.d",
"llvm.x86.mmx.psrai.w",
"llvm.x86.mmx.psrl.d",
"llvm.x86.mmx.psrl.q",
"llvm.x86.mmx.psrl.w",
"llvm.x86.mmx.psrli.d",
"llvm.x86.mmx.psrli.q",
"llvm.x86.mmx.psrli.w",
"llvm.x86.mmx.psubs.b",
"llvm.x86.mmx.psubs.w",
"llvm.x86.mmx.psubus.b",
"llvm.x86.mmx.psubus.w",
"llvm.x86.sse2.add.sd",
"llvm.x86.sse2.clflush",
"llvm.x86.sse2.cmp.pd",
"llvm.x86.sse2.cmp.sd",
"llvm.x86.sse2.comieq.sd",
"llvm.x86.sse2.comige.sd",
"llvm.x86.sse2.comigt.sd",
"llvm.x86.sse2.comile.sd",
"llvm.x86.sse2.comilt.sd",
"llvm.x86.sse2.comineq.sd",
"llvm.x86.sse2.cvtdq2pd",
"llvm.x86.sse2.cvtdq2ps",
"llvm.x86.sse2.cvtpd2dq",
"llvm.x86.sse2.cvtpd2ps",
"llvm.x86.sse2.cvtps2dq",
"llvm.x86.sse2.cvtps2pd",
"llvm.x86.sse2.cvtsd2si",
"llvm.x86.sse2.cvtsd2si64",
"llvm.x86.sse2.cvtsd2ss",
"llvm.x86.sse2.cvtsi2sd",
"llvm.x86.sse2.cvtsi642sd",
"llvm.x86.sse2.cvtss2sd",
"llvm.x86.sse2.cvttpd2dq",
"llvm.x86.sse2.cvttps2dq",
"llvm.x86.sse2.cvttsd2si",
"llvm.x86.sse2.cvttsd2si64",
"llvm.x86.sse2.div.sd",
"llvm.x86.sse2.lfence",
"llvm.x86.sse2.loadu.dq",
"llvm.x86.sse2.loadu.pd",
"llvm.x86.sse2.maskmov.dqu",
"llvm.x86.sse2.max.pd",
"llvm.x86.sse2.max.sd",
"llvm.x86.sse2.mfence",
"llvm.x86.sse2.min.pd",
"llvm.x86.sse2.min.sd",
"llvm.x86.sse2.movmsk.pd",
"llvm.x86.sse2.movnt.dq",
"llvm.x86.sse2.movnt.i",
"llvm.x86.sse2.movnt.pd",
"llvm.x86.sse2.mul.sd",
"llvm.x86.sse2.packssdw.128",
"llvm.x86.sse2.packsswb.128",
"llvm.x86.sse2.packuswb.128",
"llvm.x86.sse2.padds.b",
"llvm.x86.sse2.padds.w",
"llvm.x86.sse2.paddus.b",
"llvm.x86.sse2.paddus.w",
"llvm.x86.sse2.pavg.b",
"llvm.x86.sse2.pavg.w",
"llvm.x86.sse2.pcmpeq.b",
"llvm.x86.sse2.pcmpeq.d",
"llvm.x86.sse2.pcmpeq.w",
"llvm.x86.sse2.pcmpgt.b",
"llvm.x86.sse2.pcmpgt.d",
"llvm.x86.sse2.pcmpgt.w",
"llvm.x86.sse2.pmadd.wd",
"llvm.x86.sse2.pmaxs.w",
"llvm.x86.sse2.pmaxu.b",
"llvm.x86.sse2.pmins.w",
"llvm.x86.sse2.pminu.b",
"llvm.x86.sse2.pmovmskb.128",
"llvm.x86.sse2.pmulh.w",
"llvm.x86.sse2.pmulhu.w",
"llvm.x86.sse2.pmulu.dq",
"llvm.x86.sse2.psad.bw",
"llvm.x86.sse2.psll.d",
"llvm.x86.sse2.psll.dq",
"llvm.x86.sse2.psll.dq.bs",
"llvm.x86.sse2.psll.q",
"llvm.x86.sse2.psll.w",
"llvm.x86.sse2.pslli.d",
"llvm.x86.sse2.pslli.q",
"llvm.x86.sse2.pslli.w",
"llvm.x86.sse2.psra.d",
"llvm.x86.sse2.psra.w",
"llvm.x86.sse2.psrai.d",
"llvm.x86.sse2.psrai.w",
"llvm.x86.sse2.psrl.d",
"llvm.x86.sse2.psrl.dq",
"llvm.x86.sse2.psrl.dq.bs",
"llvm.x86.sse2.psrl.q",
"llvm.x86.sse2.psrl.w",
"llvm.x86.sse2.psrli.d",
"llvm.x86.sse2.psrli.q",
"llvm.x86.sse2.psrli.w",
"llvm.x86.sse2.psubs.b",
"llvm.x86.sse2.psubs.w",
"llvm.x86.sse2.psubus.b",
"llvm.x86.sse2.psubus.w",
"llvm.x86.sse2.sqrt.pd",
"llvm.x86.sse2.sqrt.sd",
"llvm.x86.sse2.storel.dq",
"llvm.x86.sse2.storeu.dq",
"llvm.x86.sse2.storeu.pd",
"llvm.x86.sse2.sub.sd",
"llvm.x86.sse2.ucomieq.sd",
"llvm.x86.sse2.ucomige.sd",
"llvm.x86.sse2.ucomigt.sd",
"llvm.x86.sse2.ucomile.sd",
"llvm.x86.sse2.ucomilt.sd",
"llvm.x86.sse2.ucomineq.sd",
"llvm.x86.sse3.addsub.pd",
"llvm.x86.sse3.addsub.ps",
"llvm.x86.sse3.hadd.pd",
"llvm.x86.sse3.hadd.ps",
"llvm.x86.sse3.hsub.pd",
"llvm.x86.sse3.hsub.ps",
"llvm.x86.sse3.ldu.dq",
"llvm.x86.sse3.monitor",
"llvm.x86.sse3.mwait",
"llvm.x86.sse41.blendpd",
"llvm.x86.sse41.blendps",
"llvm.x86.sse41.blendvpd",
"llvm.x86.sse41.blendvps",
"llvm.x86.sse41.dppd",
"llvm.x86.sse41.dpps",
"llvm.x86.sse41.extractps",
"llvm.x86.sse41.insertps",
"llvm.x86.sse41.movntdqa",
"llvm.x86.sse41.mpsadbw",
"llvm.x86.sse41.packusdw",
"llvm.x86.sse41.pblendvb",
"llvm.x86.sse41.pblendw",
"llvm.x86.sse41.pcmpeqq",
"llvm.x86.sse41.pextrb",
"llvm.x86.sse41.pextrd",
"llvm.x86.sse41.pextrq",
"llvm.x86.sse41.phminposuw",
"llvm.x86.sse41.pmaxsb",
"llvm.x86.sse41.pmaxsd",
"llvm.x86.sse41.pmaxud",
"llvm.x86.sse41.pmaxuw",
"llvm.x86.sse41.pminsb",
"llvm.x86.sse41.pminsd",
"llvm.x86.sse41.pminud",
"llvm.x86.sse41.pminuw",
"llvm.x86.sse41.pmovsxbd",
"llvm.x86.sse41.pmovsxbq",
"llvm.x86.sse41.pmovsxbw",
"llvm.x86.sse41.pmovsxdq",
"llvm.x86.sse41.pmovsxwd",
"llvm.x86.sse41.pmovsxwq",
"llvm.x86.sse41.pmovzxbd",
"llvm.x86.sse41.pmovzxbq",
"llvm.x86.sse41.pmovzxbw",
"llvm.x86.sse41.pmovzxdq",
"llvm.x86.sse41.pmovzxwd",
"llvm.x86.sse41.pmovzxwq",
"llvm.x86.sse41.pmuldq",
"llvm.x86.sse41.pmulld",
"llvm.x86.sse41.ptestc",
"llvm.x86.sse41.ptestnzc",
"llvm.x86.sse41.ptestz",
"llvm.x86.sse41.round.pd",
"llvm.x86.sse41.round.ps",
"llvm.x86.sse41.round.sd",
"llvm.x86.sse41.round.ss",
"llvm.x86.sse42.crc32.16",
"llvm.x86.sse42.crc32.32",
"llvm.x86.sse42.crc32.64",
"llvm.x86.sse42.crc32.8",
"llvm.x86.sse42.pcmpestri128",
"llvm.x86.sse42.pcmpestria128",
"llvm.x86.sse42.pcmpestric128",
"llvm.x86.sse42.pcmpestrio128",
"llvm.x86.sse42.pcmpestris128",
"llvm.x86.sse42.pcmpestriz128",
"llvm.x86.sse42.pcmpestrm128",
"llvm.x86.sse42.pcmpgtq",
"llvm.x86.sse42.pcmpistri128",
"llvm.x86.sse42.pcmpistria128",
"llvm.x86.sse42.pcmpistric128",
"llvm.x86.sse42.pcmpistrio128",
"llvm.x86.sse42.pcmpistris128",
"llvm.x86.sse42.pcmpistriz128",
"llvm.x86.sse42.pcmpistrm128",
"llvm.x86.sse.add.ss",
"llvm.x86.sse.cmp.ps",
"llvm.x86.sse.cmp.ss",
"llvm.x86.sse.comieq.ss",
"llvm.x86.sse.comige.ss",
"llvm.x86.sse.comigt.ss",
"llvm.x86.sse.comile.ss",
"llvm.x86.sse.comilt.ss",
"llvm.x86.sse.comineq.ss",
"llvm.x86.sse.cvtpd2pi",
"llvm.x86.sse.cvtpi2pd",
"llvm.x86.sse.cvtpi2ps",
"llvm.x86.sse.cvtps2pi",
"llvm.x86.sse.cvtsi2ss",
"llvm.x86.sse.cvtsi642ss",
"llvm.x86.sse.cvtss2si",
"llvm.x86.sse.cvtss2si64",
"llvm.x86.sse.cvttpd2pi",
"llvm.x86.sse.cvttps2pi",
"llvm.x86.sse.cvttss2si",
"llvm.x86.sse.cvttss2si64",
"llvm.x86.sse.div.ss",
"llvm.x86.sse.ldmxcsr",
"llvm.x86.sse.loadu.ps",
"llvm.x86.sse.max.ps",
"llvm.x86.sse.max.ss",
"llvm.x86.sse.min.ps",
"llvm.x86.sse.min.ss",
"llvm.x86.sse.movmsk.ps",
"llvm.x86.sse.movnt.ps",
"llvm.x86.sse.mul.ss",
"llvm.x86.sse.rcp.ps",
"llvm.x86.sse.rcp.ss",
"llvm.x86.sse.rsqrt.ps",
"llvm.x86.sse.rsqrt.ss",
"llvm.x86.sse.sfence",
"llvm.x86.sse.sqrt.ps",
"llvm.x86.sse.sqrt.ss",
"llvm.x86.sse.stmxcsr",
"llvm.x86.sse.storeu.ps",
"llvm.x86.sse.sub.ss",
"llvm.x86.sse.ucomieq.ss",
"llvm.x86.sse.ucomige.ss",
"llvm.x86.sse.ucomigt.ss",
"llvm.x86.sse.ucomile.ss",
"llvm.x86.sse.ucomilt.ss",
"llvm.x86.sse.ucomineq.ss",
"llvm.x86.ssse3.pabs.b",
"llvm.x86.ssse3.pabs.b.128",
"llvm.x86.ssse3.pabs.d",
"llvm.x86.ssse3.pabs.d.128",
"llvm.x86.ssse3.pabs.w",
"llvm.x86.ssse3.pabs.w.128",
"llvm.x86.ssse3.palign.r",
"llvm.x86.ssse3.palign.r.128",
"llvm.x86.ssse3.phadd.d",
"llvm.x86.ssse3.phadd.d.128",
"llvm.x86.ssse3.phadd.sw",
"llvm.x86.ssse3.phadd.sw.128",
"llvm.x86.ssse3.phadd.w",
"llvm.x86.ssse3.phadd.w.128",
"llvm.x86.ssse3.phsub.d",
"llvm.x86.ssse3.phsub.d.128",
"llvm.x86.ssse3.phsub.sw",
"llvm.x86.ssse3.phsub.sw.128",
"llvm.x86.ssse3.phsub.w",
"llvm.x86.ssse3.phsub.w.128",
"llvm.x86.ssse3.pmadd.ub.sw",
"llvm.x86.ssse3.pmadd.ub.sw.128",
"llvm.x86.ssse3.pmul.hr.sw",
"llvm.x86.ssse3.pmul.hr.sw.128",
"llvm.x86.ssse3.pshuf.b",
"llvm.x86.ssse3.pshuf.b.128",
"llvm.x86.ssse3.psign.b",
"llvm.x86.ssse3.psign.b.128",
"llvm.x86.ssse3.psign.d",
"llvm.x86.ssse3.psign.d.128",
"llvm.x86.ssse3.psign.w",
"llvm.x86.ssse3.psign.w.128",

Reply all
Reply to author
Forward
0 new messages