While working on TurkeyMan's std.simd I noticed that some things are still impossible to implement efficiently using LDC. One example is the equivalent of _mm_cmpgt_epi32 intrinsic (
http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_sse2_int_comparison.htm). LLVM does not provide GCC builtin __builtin_ia32_pcmpgtd128 which is needed to implement such a function. Clang implements it using a comparison operator on on integer vectors. This compiles to an LLVM comparison instruction followed by a sext. There is no way to express this in D.
We can't do what clang does here because the vectors in LDC are part of the language and not a compiler extension. One way to solve this would be to add another pragma, but I don't think that's a good option in this case. This intrinsic would cover only a very specific use case and there are probably other cases where one would face similar problems. There will probably be more such cases in the future, when LDC will support more platforms. It also seems that LLVM often removes intrinsics when what they do can be expressed in other ways. So I think that adding a pragma for each operation that may be hard to efficiently express in D (and for which there is no intrinsic in LLVM) could eventually lead to quite a lot of pragmas.
pragma(llvm_inline_ir)
R inlineIR(string s, R, P...)(P);
void foo()
{
auto gtMask = inlineIR!(`
%cmp = icmp sgt <4 x i32> %0, %1
%r = sext <4 x i1> %cmp to <4 x i32>
ret <4 x i32> %r`,
int4)(a, b);
}
Do you think adding support for inline LLVM IR is a good idea?
Regards,
Jernej