[LLVMdev] Converting a i32 pointer to a vector of i32 ( C array to LLVM vector)

Matthieu Dubet

unread,

Oct 11, 2013, 1:27:11 PM10/11/13

to llv...@cs.uiuc.edu

Hi,

I'm creating a small function in LLVM which gets as a parameter an i32* (this function is called from C code) .

However I know that this pointer is actually a C array of size 10 ( int[10] ).

How can I tell LLVM to consider this i32* as an <10 x i32> (and thus get the performance improvements thanks to SIMD ..etc..) ?

Thanks,
Matthieu

Renato Golin

unread,

Oct 11, 2013, 1:40:09 PM10/11/13

to Matthieu Dubet, LLVM Dev

On 11 October 2013 18:27, Matthieu Dubet <maa...@gmail.com> wrote:

How can I tell LLVM to consider this i32* as an <10 x i32> (and thus get the performance improvements thanks to SIMD ..etc..) ?

Hi Matthieu,

You shouldn't need to do anything, the vectorizer should spot that for you, if the machine you're compiling to has support for vector instructions. Any kind of vector operations that you may want to hard-code will make it not work on anything other than the intrinsics/inline asm you're using, which is not a good idea.

If your code didn't get vectorized, it's possible that it is not clear enough that that pointer is being iterated in a way that it's easy for the vectorizer to spot, so maybe you need to make it clearer, and that depends on the code in question. If you could share the code (or a similar example) with the list, people could help you spot the pattern and make it vectorize.

cheers,

--renato

Matthieu Dubet

unread,

Oct 16, 2013, 11:14:06 AM10/16/13

to Renato Golin, LLVM Dev

Hi,

Thank you for the information,

So I'm now keeping the array as a pointer (i32*) but the vectorizer doesn't vectorize it .

I've pasted the function code before and after optimization (and the list of optimization that I have activated) in this Gist : https://gist.github.com/maattd/7008683

Some "weird" fact of my LLVM code :

* all variables (even the one used for the loop condition) are pointers to memory allocated from the C world and passed to the LLVM functions as an argument
* even with "opt->add(new llvm::DataLayout(*ee->getDataLayout())) ;" in the code, the module->dump() doesn't output neither data layout, nor triple target

Both those points might confuse the vectorizer ?

Tom Stellard

unread,

Oct 16, 2013, 8:28:01 PM10/16/13

to Matthieu Dubet, LLVM Dev

Which part of the vectorizer is responsible for doing pointer->vector transformations?

-Tom

> > If your code didn't get vectorized, it's possible that it is not clear
> > enough that that pointer is being iterated in a way that it's easy for the
> > vectorizer to spot, so maybe you need to make it clearer, and that depends
> > on the code in question. If you could share the code (or a similar example)
> > with the list, people could help you spot the pattern and make it vectorize.
> >
> > cheers,
> > --renato

> >

> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nadav Rotem

unread,

Oct 16, 2013, 8:59:05 PM10/16/13

to Tom Stellard, LLVM Dev

Both the SLP vectorizer and the Loop vectorizer support vectorizing pointers. The attached code looks like a candidate for the SLP-vectorizer. Can you run the SLP-vectorizer with the flag -mllvm -debug-only=SLP and attach the log ? I think that we are missing the pattern for the roots of the tree.

Thanks,

Nadav

Matthieu Dubet

unread,

Oct 16, 2013, 10:39:57 PM10/16/13

to Nadav Rotem, LLVM Dev

Hi,

So I've tried the Loop vectorizer and the SLP vectorizer (LLVM 3.3) on this code : (which is assigning 5 to each element of the array "%b")

; ModuleID = 'res.ll'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: nounwind
define void @loop_ptr_19121568([10 x i32*]* nocapture %params_vec) #0 {
entry:
%0 = bitcast [10 x i32*]* %params_vec to double**
%temp_1 = load double** %0, align 8
%1 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 1
%2 = load i32** %1, align 8
%temp_2 = bitcast i32* %2 to double*
%3 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 3
%4 = load i32** %3, align 8
%b = bitcast i32* %4 to double*
%5 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 4
%6 = load i32** %5, align 8
%d = bitcast i32* %6 to double*
%7 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 6
%8 = load i32** %7, align 8
%temp_0 = bitcast i32* %8 to double*
%9 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 7
%10 = load i32** %9, align 8
%temp_4 = bitcast i32* %10 to i1*
%11 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 8
%12 = load i32** %11, align 8
%temp_3 = bitcast i32* %12 to double*
%13 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 9
%14 = load i32** %13, align 8
%i = bitcast i32* %14 to double*
store double 1.000000e+00, double* %temp_0, align 8
%15 = load double* %temp_1, align 8
store double %15, double* %temp_2, align 8
%16 = load double* %d, align 8
%17 = fmul double %16, %16
store double %17, double* %temp_3, align 8
%.pre = load double* %temp_0, align 8
%cmp_le1 = fcmp ole double %.pre, %17
store i1 %cmp_le1, i1* %temp_4, align 1
br i1 %cmp_le1, label %"i = temp_0", label %end_fun

"i = temp_0": ; preds = %entry, %"i = temp_0"
%18 = load double* %temp_0, align 8
store double %18, double* %i, align 8
%19 = fptoui double %18 to i32
%20 = add i32 %19, -1
%21 = sext i32 %20 to i64
%22 = getelementptr double* %b, i64 %21
store double 5.000000e+00, double* %22, align 8
%23 = load double* %temp_0, align 8
%24 = load double* %temp_2, align 8
%25 = fadd double %23, %24
store double %25, double* %temp_0, align 8
%.pre1 = load double* %temp_3, align 8
%cmp_le = fcmp ole double %25, %.pre1
store i1 %cmp_le, i1* %temp_4, align 1
br i1 %cmp_le, label %"i = temp_0", label %end_fun

end_fun: ; preds = %"i = temp_0", %entry
ret void
}

attributes #0 = { nounwind }

-------------------------------------------------------------------------------------------------------
The loop vectorizer find the loop, but I don't exactly get the trouble with the loop exit count ..

LV: Checking a loop in "loop_ptr_19121568"
LV: Found a loop: i = temp_0
LV: SCEV could not compute the loop exit count.
LV: Not vectorizing.

The SLP vectorizer debug :

SLP: Vectorizing a list of length = 2.
SLP: Cost of pair:1 Cost of extract:1.
SLP: Vectorizing a list of length = 2.
SLP: Cost of pair:1 Cost of extract:1.
SLP: Found 4 stores to vectorize.
SLP: Vectorizing a list of length = 2.
SLP: Cost of pair:1 Cost of extract:1.
SLP: Vectorizing a list of length = 2.
SLP: Cost of pair:1 Cost of extract:1.
SLP: Found 4 stores to vectorize.

Thanks,
Matthieu

James Courtier-Dutton

unread,

Oct 17, 2013, 4:53:49 AM10/17/13

to Matthieu Dubet, llv...@cs.uiuc.edu

If what you are saying is that you know the array of i32 will always be 10 entries, make the function use a constant limit=10 to the loop.

I.e Make the loop limit a constant and not a variable.

On 11 October 2013 18:27, Matthieu Dubet <maa...@gmail.com> wrote:

Matthieu Dubet

unread,

Oct 17, 2013, 10:15:42 AM10/17/13

to James Courtier-Dutton, llv...@cs.uiuc.edu

Even if I know the size of the array, I'm not always iterating through it entirely so the loop count has to be a variable, but the vectorizer works fine even with a loop limit not constant when compiling C code from Clang for example so I should be able to do the same for this code .. (hopefully :) )

Matthieu

Reply all

Reply to author

Forward