LLVM: trying to use x86 pext intrinsic

559 views
Skip to first unread message

Stefan Karpinski

unread,
Nov 23, 2014, 11:23:33 PM11/23/14
to Julia Dev
I'm trying to use the PEXT x86 intrinsics from LLVM (see patch below), but I get the following error when I call it:

julia> using Core.Intrinsics

julia> pext(x::Uint32, y::Uint32) = box(Uint32, pext32(unbox(Uint32, x), unbox(Uint32, y)))
pext (generic function with 1 method)

julia> pext(0x8daf8af4, 0b00000111_00111111_00111111_00111111)
LLVM ERROR: Program used external function 'llvm.x86.bmi.pext.32.i32' which could not be resolved!

This seems like an issue with how LLVM is linked. The Intrinsic::x86_bmi_pext_32 intrinsic does seem to exist; does anyone have any idea what I have to do to allow LLVM to find it?

By way of motivation, this instruction may allow freakishly fast UTF-8 decoding if it does what I think it does and I can get it to work.

----------------

diff --git a/src/intrinsics.cpp b/src/intrinsics.cpp
index e4efcdd..7c16c5f 100644
--- a/src/intrinsics.cpp
+++ b/src/intrinsics.cpp
@@ -38,6 +38,7 @@ namespace JL_I {
         sqrt_llvm, powi_llvm,
         // byte vectors
         bytevec_ref, bytevec_ref32, bytevec_utf8_ref,
+        pext32,
         // pointer access
         pointerref, pointerset, pointertoref,
         // c interface
@@ -1009,6 +1010,14 @@ static Value *emit_intrinsic(intrinsic f, jl_value_t **args, size_t nargs,
     Value *den;
     Value *typemin;
     switch (f) {
+    HANDLE(pext32,2) {
+        return builder.CreateCall2(
+            Intrinsic::getDeclaration(
+                jl_Module,
+                Intrinsic::x86_bmi_pext_32,
+                ArrayRef<Type*>(T_uint32)
+        ), JL_INT(x), JL_INT(y));
+    }
     HANDLE(bytevec_ref,2) {
         Value *b = JL_INT(x);
         Value *i = builder.CreateSub(JL_INT(y), ConstantInt::get(T_size, 1));
@@ -1793,6 +1802,7 @@ extern "C" void jl_init_intrinsic_functions(void)
     ADD_I(flipsign_int); ADD_I(select_value); ADD_I(sqrt_llvm);
     ADD_I(powi_llvm);
     ADD_I(bytevec_ref); ADD_I(bytevec_ref32); ADD_I(bytevec_utf8_ref);
+    ADD_I(pext32);
     ADD_I(pointerref); ADD_I(pointerset); ADD_I(pointertoref);
     ADD_I(checked_sadd); ADD_I(checked_uadd);
     ADD_I(checked_ssub); ADD_I(checked_usub);

Simon Kornblith

unread,
Nov 24, 2014, 12:14:22 AM11/24/14
to juli...@googlegroups.com
Maybe try just Intrinsic::getDeclaration(jl_Module, Intrinsic::x86_bmi_pext_32)?

Stefan Karpinski

unread,
Nov 24, 2014, 1:03:59 AM11/24/14
to Julia Dev
That gives me a different error:

julia> pext(0x8daf8af4, 0b00000111_00111111_00111111_00111111)
LLVM ERROR: Cannot select: intrinsic %llvm.x86.bmi.pext.32

Seems like an improvement?

Keno Fischer

unread,
Nov 24, 2014, 1:37:44 AM11/24/14
to juli...@googlegroups.com
Do you have a Haswell CPU? If not that would explain that. 

Stefan Karpinski

unread,
Nov 24, 2014, 1:47:32 AM11/24/14
to Julia Dev
Ah, right. No, it seems like a have an Ivy Bridge. For some reason I was thinking this instruction was a few years older than that.

Simon Kornblith

unread,
Nov 24, 2014, 11:54:50 AM11/24/14
to juli...@googlegroups.com
I think we also disable BMI2 instructions when Julia is compiled against LLVM 3.3 due to bugs, so this might not work on Haswell either unless you use a newer LLVM.

Stefan Karpinski

unread,
Nov 24, 2014, 12:16:24 PM11/24/14
to juli...@googlegroups.com
Yeah, I tried it with LLVM 3.5 and that didn't work either – obviously since I don't have a Haswell CPU. Man, I really want to be able to use this. I wonder how I can try this out on a Haswell machine...

Simon Byrne

unread,
Nov 24, 2014, 5:52:04 PM11/24/14
to juli...@googlegroups.com
Try getting in touch with Xianyi on the openblas-dev list. They have a Haswell machine (among others) which they've let me use in the past.

simon

Elliot Saba

unread,
Nov 24, 2014, 6:09:17 PM11/24/14
to Julia Dev
I also have a Haswell OSX machine that I can give you access to.  Just email me your SSH public key.
-E

Stefan Karpinski

unread,
Nov 26, 2014, 7:39:32 PM11/26/14
to Julia Dev
Thanks to Elliot, I've been able to try this out and you can indeed get crazy efficient code with this instruction:

julia> using Core.Intrinsics

julia> pext(x::Uint32, y::Uint32) =
           box(Uint32, pext32(unbox(Uint32, x), unbox(Uint32, y)))
pext (generic function with 1 method)

julia> pext(bswap(0x8daf8af4), 0x073f3f3f)
0x0010abcd

julia> function decu8(x::UInt32)
           b = x % UInt8
           m = ifelse(b >>> 7 ==  0, 0x7f000000,
               ifelse(b >>> 5 ==  6, 0x1f3f0000,
               ifelse(b >>> 4 == 14, 0x0f3f3f00, 0x073f3f3f)))
           pext(bswap(x), m)
       end
decu8 (generic function with 1 method)

julia> decu8(reinterpret(Uint32,"\U10abcd".data)[1])
0x0010abcd

julia> decu8(reinterpret(Uint32,"\u2203!".data)[1])
0x00002203

julia> decu8(reinterpret(Uint32,"\u123!!".data)[1])
0x00000123

julia> decu8(reinterpret(Uint32,"\u13!!!".data)[1])
0x00000013

julia> @code_llvm decu8(UInt32(123))

define i32 @julia_decu842534(i32) {
top:
  %1 = trunc i32 %0 to i8, !dbg !8, !julia_type !10
  %2 = icmp slt i8 %1, 0, !dbg !11
  %.mask = and i8 %1, -32, !dbg !11
  %3 = icmp ne i8 %.mask, -64, !dbg !11
  %.mask5 = and i8 %1, -16, !dbg !11
  %4 = icmp ne i8 %.mask5, -32, !dbg !11
  %5 = select i1 %4, i32 121585471, i32 255803136, !dbg !11
  %6 = select i1 %3, i32 %5, i32 524222464, !dbg !11
  %7 = select i1 %2, i32 %6, i32 2130706432, !dbg !11
  %8 = call i32 @llvm.bswap.i32(i32 %0), !dbg !12, !julia_type !13
  %9 = call i32 @llvm.x86.bmi.pext.32(i32 %8, i32 %7), !dbg !12, !julia_type !13
  ret i32 %9, !dbg !12
}

That's some really sleek LLVM code for UTF-8 character decoding. Unfortunately, I can't see the native code because this required LLVM 3.5 and our @code_native macro seems to be broken on that version of LLVM. I also haven't been able to do a realistic time comparison because that requires rigging up quite a bit more stuff, but this is a pretty cool instruction in any case.
Reply all
Reply to author
Forward
0 new messages