Re: Struggling to build via generic LLVM bitcode

179 views
Skip to first unread message

Dmitry Babokin

unread,
Apr 10, 2013, 5:07:30 AM4/10/13
to ispc-...@googlegroups.com
Hi Scott,

Using genric-* targets with --emit-llvm is not really supported model. ISPC has a number of intrinsics, which are defined in target specific library. This library is linked to object file / llvm bitcode for targets, which support object generation (i.e. sse4, avx and etc), and for generic targets (the targets assuming C++ output) a header file is supplied, which provides implementation of these intrinsics (see $ISPC_DIR/examples/intrinsics).

You are using generic-32 target, but actually trying to generate an object file, so these intrinsics are missing for you.

Is there a reson for you to use generic-32 target? What hardware are you targeting?

-Dmitry.



On Wed, Apr 10, 2013 at 4:29 AM, Scott Pakin <scot...@pakin.org> wrote:
I'd like to be able to compile ISPC programs to generic LLVM bitcode so I can instrument the code using a bitcode-based performance-analysis tool I'm developing (Byfl, in case anyone cares).  However, when trying the mandelbrot test program I'm winding up with a bunch of undefined __movmsk symbols that I don't know how to handle:

    $ ispc --version
    Intel(r) SPMD Program Compiler (ispc), 1.3.0 (build commit b69d783e096e2294 @ 20120628, LLVM 3.2)
    $ mkdir objs
    $ ispc -O2 --arch=x86-64 --target=generic-32 --emit-llvm mandelbrot.ispc -o objs/mandelbrot_ispc.bc -h objs/mandelbrot_ispc.h
    WARNING: Linking two modules of different data layouts!
    $ g++ mandelbrot.cpp -Iobjs/ -O2 -m64 -c -o objs/mandelbrot.o
    $ g++ mandelbrot_serial.cpp -Iobjs/ -O2 -m64 -c -o objs/mandelbrot_serial.o
    $ g++ ../tasksys.cpp -Iobjs/ -O2 -m64 -c -o objs/tasksys.o
    ../tasksys.cpp: In function ‘void InitTaskSystem()’:
    ../tasksys.cpp:734:90: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    $ clang++ -Iobjs/ -O2 -m64 -o mandelbrot objs/mandelbrot.o objs/mandelbrot_serial.o objs/tasksys.o objs/mandelbrot_ispc.bc -lm -lpthread -lstdc++
    /tmp/mandelbrot_ispc-IWtDvg.o: In function `mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E_':
    objs/mandelbrot_ispc.bc:(.text+0x93f): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0x1ea0): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0x2de7): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0x4527): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0x4c02): undefined reference to `__masked_store_i32'
    objs/mandelbrot_ispc.bc:(.text+0x5af0): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0x7238): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0x7913): undefined reference to `__masked_store_i32'
    /tmp/mandelbrot_ispc-IWtDvg.o: In function `mandelbrot_ispc':
    objs/mandelbrot_ispc.bc:(.text+0x8294): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0x97ea): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0xa6d6): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0xbef4): undefined reference to `__movmsk'
    objs/mandelbrot_ispc.bc:(.text+0xc5df): undefined reference to `__masked_store_i32'
    clang: error: linker command failed with exit code 1 (use -v to see invocation)

Any hints?  Is this even a supported workflow?

Thanks,
—Scott

--
You received this message because you are subscribed to the Google Groups "Intel SPMD Program Compiler Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ispc-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Scott Pakin

unread,
Apr 10, 2013, 12:13:48 PM4/10/13
to ispc-...@googlegroups.com
Dmitry,


On Wednesday, April 10, 2013 3:07:30 AM UTC-6, Dmitry Babokin wrote:
Using genric-* targets with --emit-llvm is not really supported model. ISPC has a number of intrinsics, which are defined in target specific library. This library is linked to object file / llvm bitcode for targets, which support object generation (i.e. sse4, avx and etc), and for generic targets (the targets assuming C++ output) a header file is supplied, which provides implementation of these intrinsics (see $ISPC_DIR/examples/intrinsics).

Okay, but how do I point ISPC to one of those header files?  They contain only definitions so I can't separately compile them, and #include'ing them from mandelbrot.ispc doesn't work.  Are you saying those are something I'd use if I were to build a custom ISPC compiler from source?
 
Is there a reson for you to use generic-32 target? What hardware are you targeting?

Well, "generic" hardware. ;-)  I was hoping to have all the vector operations expressed in terms of LLVM vector types rather than any particular x86 instruction set—and using the widest vectors supported by ISPC, which ispc --help indicates is generic-32.  That way, my performance-analysis tool can reason about the application+compiler's view of vector usage as distinct from the target hardware's view.

Thanks for the reply,
—Scott

Dmitry Babokin

unread,
Apr 10, 2013, 4:34:17 PM4/10/13
to ispc-...@googlegroups.com
Scott,

You can supply header file for generic target with --c++-include-file=<name>, but it won't work for --emit-llvm target. Generic targets are designed for c++ output. Could you try --emit-c++ and use clang to obtain desired bitcode file? It may work for you.

I'm not sure if completely "generic" vector representation of the code (i.e. without assumptions about target hardware) has any practical value. What kind of conclusions are you going to do by analyzing this generic code?

-Dmitry



—Scott

Scott Pakin

unread,
Apr 10, 2013, 5:13:15 PM4/10/13
to ispc-...@googlegroups.com
Dmitry,


On Wednesday, April 10, 2013 2:34:17 PM UTC-6, Dmitry Babokin wrote:
You can supply header file for generic target with --c++-include-file=<name>, but it won't work for --emit-llvm target. Generic targets are designed for c++ output. Could you try --emit-c++ and use clang to obtain desired bitcode file? It may work for you.

Hmm...Is this what you had in mind?

    $ ispc -O2 --arch=x86-64 --target=generic-16 --emit-c++ --c++-include-file=../intrinsics/generic-16.h mandelbrot.ispc -o objs/mandelbrot_ispc.cpp -h objs/mandelbrot_ispc.h
    WARNING: Linking two modules of different data layouts!
    $ clang++ -I. -emit-llvm objs/mandelbrot_ispc.cpp
...
    objs/mandelbrot_ispc.cpp:233:56: error: no matching function for call to '__smear_float'
      div_sub_x1_load_x0_load_width_load_to_float_smear_ = __smear_float(__vec16_f ( /* UNDEF */), (((float )((((float )(x1_ - x0_))) / (((float )(int32_t )width_))))));
  ^~~~~~~~~~~~~
    ./../intrinsics/generic-16.h:670:1: note: candidate function template not viable: requires 1 argument, but 2 were provided
    SMEAR(__vec16_f, float, float)
    ^
    ./../intrinsics/generic-16.h:275:35: note: expanded from macro 'SMEAR'
    template <class RetVecType> VTYPE __smear_##NAME(STYPE);           \
     ^
    <scratch space>:126:1: note: expanded from here
    __smear_float
    ^
...
    fatal error: too many errors emitted, stopping now [-ferror-limit=]
    14 warnings and 20 errors generated.

What am I misunderstanding here?

I'm not sure if completely "generic" vector representation of the code (i.e. without assumptions about target hardware) has any practical value. What kind of conclusions are you going to do by analyzing this generic code?

Comparisons with non-ISPC code of ratio of vectorizable operations to non-vectorizable operations, for example.  Ideally, I'd like infinite vector length for this so I can say things like, "75% of the instructions executed by such-and-such code are vector instructions".  Statements like that get biased by hardware vector-length limits when running traditional profilers or instrumenters; a programmer would more likely think of A=B+C, where A, B, and C are 1024-element vectors, as a single vector operation rather than 256 vector operations on machine X and 128 vector operations on machine Y, and that's how I'd like to report it.  Furthermore, when a vendor like Intel comes to my organization and asks, "If we were to double the hardware vector length for AVX N+1, would your code be able to take advantage of that?", being able to experiment with generic hardware enables us to answer such questions.

—Scott

Dmitry Babokin

unread,
Apr 11, 2013, 8:34:28 AM4/11/13
to ispc-...@googlegroups.com
Scott,

You got me right, but it's surprising for me that it doesn't work for you. It does work for me. I'm using head git revision (from the warnign in your output I suspect that you are using couple week old version from github). I also assume that you are using mandelbrot from examples/mandelbrot.

> ispc -O2 --arch=x86-64 --target=generic-16 --emit-c++ --c++-include-file=../intrinsics/generic-16.h mandelbrot.ispc -o objs/mandelbrot_ispc.cpp -h objs/mandelbrot_ispc.h
> clang++ -I. objs/mandelbrot_ispc.cpp -Wno-int-to-pointer-cast -Wno-parentheses-equality -emit-llvm -c -o test.bc -O2
> llvm-dis test.bc
> grep __smear_float test.ll |grep define
define void @_Z13__smear_floatI9__vec16_fES0_f(%struct.__vec16_f* noalias nocapture sret %agg.result, float %v) #1 {

It does work for me with LLVM 3.1/3.2/3.3(ToT). I'm using clang to build ispc (make clang).


Thanks for explaining the purpose of the experiment. This makes sense to me, but be aware that some vectorization may not happen in real world because it's not profitable on particular hardware. And this may make your estimation inaccurate.

-Dmitry.




—Scott

--

Scott Pakin

unread,
Apr 15, 2013, 8:00:48 PM4/15/13
to ispc-...@googlegroups.com
Now that I can finally run ispc HEAD, I took another stab at generating LLVM bitcode with --emit-llvm that I can instrument, compile, link, and run.  (The --emit-c++ route loses all the vectorization I wanted to see :-( .)  When I compile the mandelbrot example to bitcode, it's missing only two symbols, __any and __masked_store_i32.  I tried defining them with the attached C file, which I derived from  ispc/examples/intrinsics/generic-16.h.  Unfortunately, mandelbrot hangs when I do so.  Am I simply doing something stupid, or is this whole approach bound to fail?


On Thursday, April 11, 2013 6:34:28 AM UTC-6, Dmitry Babokin wrote:
Thanks for explaining the purpose of the experiment. This makes sense to me, but be aware that some vectorization may not happen in real world because it's not profitable on particular hardware. And this may make your estimation inaccurate.

Yes, that's in fact one reason that my approach complements existing, hardware-centric tools.  Byfl can indicate where vectorization could happen (i.e., that there are no data dependencies) while other tools indicate where it did happen.  As you correctly point out, that may not be everywhere due to profitability metrics.

—Scott
generic-16.c

Dmitry Babokin

unread,
Apr 16, 2013, 12:41:49 PM4/16/13
to ispc-...@googlegroups.com
Scott,

Generic target + --emit-llvm still looks tricky to me, though you only need to provide ISPC standard library implementation to make it work. Mandelbrot has only two functions it needs, but there are many more that you will meet with other examples. But the good news is that some of them are already implemanted, as LLVM optimizer needs to see them for better optimization. You may notice than ISPC build has the follwoing command:
m4 -Ibuiltins/ -DLLVM_VERSION=LLVM_3_3 builtins/target-generic-16.ll | python bitcode2cpp.py builtins/target-generic-16.ll > objs/builtins-target-generic-16.cpp

So if you do
m4 -Ibuiltins/ -DLLVM_VERSION=LLVM_3_3 builtins/target-generic-16.ll > generic16.ll
you'll see what is passed to ISPC as a standard library with generic-16 target. Note that many functions are only declared, but not defined. But you may play with generation scripts to actually enable / implement more stuff.

The problem with your generic-16.c is that you are implementing function with incorrect signature. In LLVM terms it is:
         declare void @__masked_store_i32(<16 x i32>* nocapture, <16 x i32>, <16 x i1>) nounwind
Note that first argument is vector.

By the way, you can use gcc 4.7 now, the bug with gcc was hunted down and fixed.

-Dmitry.



—Scott

--

Scott Pakin

unread,
Apr 18, 2013, 2:13:01 PM4/18/13
to ispc-...@googlegroups.com
Dmitry,


On Tuesday, April 16, 2013 10:41:49 AM UTC-6, Dmitry Babokin wrote:
Generic target + --emit-llvm still looks tricky to me, though you only need to provide ISPC standard library implementation to make it work. Mandelbrot has only two functions it needs, but there are many more that you will meet with other examples. But the good news is that some of them are already implemanted, as LLVM optimizer needs to see them for better optimization. You may notice than ISPC build has the follwoing command:
m4 -Ibuiltins/ -DLLVM_VERSION=LLVM_3_3 builtins/target-generic-16.ll | python bitcode2cpp.py builtins/target-generic-16.ll > objs/builtins-target-generic-16.cpp

So if you do
m4 -Ibuiltins/ -DLLVM_VERSION=LLVM_3_3 builtins/target-generic-16.ll > generic16.ll
you'll see what is passed to ISPC as a standard library with generic-16 target. Note that many functions are only declared, but not defined. But you may play with generation scripts to actually enable / implement more stuff.

Thanks for the tips.
 
The problem with your generic-16.c is that you are implementing function with incorrect signature. In LLVM terms it is:
         declare void @__masked_store_i32(<16 x i32>* nocapture, <16 x i32>, <16 x i1>) nounwind
Note that first argument is vector.

No, actually the first argument is a pointer to a vector.  My code accept a void * and correctly cast it so that wasn't the problem. However, you at least pointed me to the correct spot to look for problems.  I looked at the generated x86 assembly code for an LLVM <16 x i1> datatype and found to my surprise that it's represented not as a uint16_t (which would make sense and is what ISPC's generic-16.h assumes), nor as a uint8_t[16] but—brace yourself as a—uint32_t[16].  Yikes!  I still need to verify that that's truly the case, but mandelbrot does build and run now so I'm at least on the right track.

In case it helps anyone else, I've attached my revised generic-16.c file.  Here's how I used it to build mandelbrot:

clang -g -O3 -m64 -c -o objs/generic-16.o generic-16.c
ispc -O3 --arch=x86-64 --target=generic-16 --emit-llvm mandelbrot.ispc -o objs/mandelbrot_ispc.bc -h objs/mandelbrot_ispc.h
clang++ -c -o objs/mandelbrot_ispc.o objs/mandelbrot_ispc.bc
clang++ mandelbrot.cpp -Iobjs/ -g -g -O3 -m64 -c -o objs/mandelbrot.o
clang++ mandelbrot_serial.cpp -Iobjs/ -g -g -O3 -m64 -c -o objs/mandelbrot_serial.o
clang++ ../tasksys.cpp -Wno-int-to-pointer-cast -Iobjs/ -g -g -O3 -m64 -c -o objs/tasksys.o
clang++ -Iobjs/ -g -g -O3 -m64 -o mandelbrot objs/mandelbrot.o objs/mandelbrot_serial.o objs/tasksys.o objs/mandelbrot_ispc.o objs/generic-16.o -lm -lpthread -lstdc++
 
By the way, you can use gcc 4.7 now, the bug with gcc was hunted down and fixed

Ah, cool.

 —Scott
generic-16.c

Dmitry Babokin

unread,
Apr 22, 2013, 12:33:11 PM4/22/13
to ispc-...@googlegroups.com
Scott,

Sorry for misleading you with @__masked_store_i32 argument.

It's surprising that < 16 x i1 > is uint32_t[16], though I believe
that it could happen - an LLVM backed used for code generation may
have its own definition for this type. For generic-16 target (which
goes via C++ source code) < 16 x i1 > is defined as int16_t.

I'm not sure how to force clang to treat < 16 x i1 > differently.

But except inefficientcy it should mean that your approach may work
(if the standard library is redefined).

-Dmitry.
Reply all
Reply to author
Forward
0 new messages