[LLVMdev] Converting a i32 pointer to a vector of i32 ( C array to LLVM vector)

754 views
Skip to first unread message

Matthieu Dubet

unread,
Oct 11, 2013, 1:27:11 PM10/11/13
to llv...@cs.uiuc.edu
Hi,

I'm creating a small function in LLVM which gets as a parameter an i32* (this function is called from C code) .

However I know that this pointer is actually a C array of size 10 ( int[10] ).

How can I tell LLVM to consider this i32* as an <10 x i32> (and thus get the performance improvements thanks to SIMD ..etc..) ?

Thanks,
Matthieu

Renato Golin

unread,
Oct 11, 2013, 1:40:09 PM10/11/13
to Matthieu Dubet, LLVM Dev
On 11 October 2013 18:27, Matthieu Dubet <maa...@gmail.com> wrote:
How can I tell LLVM to consider this i32* as an <10 x i32> (and thus get the performance improvements thanks to SIMD ..etc..) ?

Hi Matthieu,

You shouldn't need to do anything, the vectorizer should spot that for you, if the machine you're compiling to has support for vector instructions. Any kind of vector operations that you may want to hard-code will make it not work on anything other than the intrinsics/inline asm you're using, which is not a good idea.

If your code didn't get vectorized, it's possible that it is not clear enough that that pointer is being iterated in a way that it's easy for the vectorizer to spot, so maybe you need to make it clearer, and that depends on the code in question. If you could share the code (or a similar example) with the list, people could help you spot the pattern and make it vectorize.

cheers,
--renato

Matthieu Dubet

unread,
Oct 16, 2013, 11:14:06 AM10/16/13
to Renato Golin, LLVM Dev
Hi,

Thank you for the information,

So I'm now keeping the array as a pointer (i32*) but the vectorizer doesn't vectorize it .

I've pasted the function code before and after optimization (and the list of optimization that I have activated) in this Gist : https://gist.github.com/maattd/7008683

Some "weird" fact of my LLVM code :

* all variables (even the one used for the loop condition) are pointers to memory allocated from the C world and passed to the LLVM functions as an argument
* even with "opt->add(new llvm::DataLayout(*ee->getDataLayout())) ;" in the code, the module->dump() doesn't output neither data layout, nor triple target

Both those points might confuse the vectorizer ?

Tom Stellard

unread,
Oct 16, 2013, 8:28:01 PM10/16/13
to Matthieu Dubet, LLVM Dev
Which part of the vectorizer is responsible for doing pointer->vector transformations?

-Tom

> > If your code didn't get vectorized, it's possible that it is not clear
> > enough that that pointer is being iterated in a way that it's easy for the
> > vectorizer to spot, so maybe you need to make it clearer, and that depends
> > on the code in question. If you could share the code (or a similar example)
> > with the list, people could help you spot the pattern and make it vectorize.
> >
> > cheers,
> > --renato


> >

> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nadav Rotem

unread,
Oct 16, 2013, 8:59:05 PM10/16/13
to Tom Stellard, LLVM Dev
Both the SLP vectorizer and the Loop vectorizer support vectorizing pointers.  The attached code looks like a candidate for the SLP-vectorizer. Can you run the SLP-vectorizer with the flag -mllvm -debug-only=SLP and attach the log ?  I think that we are missing the pattern for the roots of the tree. 

Thanks,
Nadav

Matthieu Dubet

unread,
Oct 16, 2013, 10:39:57 PM10/16/13
to Nadav Rotem, LLVM Dev
Hi,

So I've tried the Loop vectorizer and the SLP vectorizer (LLVM 3.3) on this code  : (which is assigning 5 to each element of the array "%b")


; ModuleID = 'res.ll'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: nounwind
define void @loop_ptr_19121568([10 x i32*]* nocapture %params_vec) #0 {
entry:
  %0 = bitcast [10 x i32*]* %params_vec to double**
  %temp_1 = load double** %0, align 8
  %1 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 1
  %2 = load i32** %1, align 8
  %temp_2 = bitcast i32* %2 to double*
  %3 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 3
  %4 = load i32** %3, align 8
  %b = bitcast i32* %4 to double*
  %5 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 4
  %6 = load i32** %5, align 8
  %d = bitcast i32* %6 to double*
  %7 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 6
  %8 = load i32** %7, align 8
  %temp_0 = bitcast i32* %8 to double*
  %9 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 7
  %10 = load i32** %9, align 8
  %temp_4 = bitcast i32* %10 to i1*
  %11 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 8
  %12 = load i32** %11, align 8
  %temp_3 = bitcast i32* %12 to double*
  %13 = getelementptr [10 x i32*]* %params_vec, i64 0, i64 9
  %14 = load i32** %13, align 8
  %i = bitcast i32* %14 to double*
  store double 1.000000e+00, double* %temp_0, align 8
  %15 = load double* %temp_1, align 8
  store double %15, double* %temp_2, align 8
  %16 = load double* %d, align 8
  %17 = fmul double %16, %16
  store double %17, double* %temp_3, align 8
  %.pre = load double* %temp_0, align 8
  %cmp_le1 = fcmp ole double %.pre, %17
  store i1 %cmp_le1, i1* %temp_4, align 1
  br i1 %cmp_le1, label %"i = temp_0", label %end_fun

"i = temp_0":                                     ; preds = %entry, %"i = temp_0"
  %18 = load double* %temp_0, align 8
  store double %18, double* %i, align 8
  %19 = fptoui double %18 to i32
  %20 = add i32 %19, -1
  %21 = sext i32 %20 to i64
  %22 = getelementptr double* %b, i64 %21
  store double 5.000000e+00, double* %22, align 8
  %23 = load double* %temp_0, align 8
  %24 = load double* %temp_2, align 8
  %25 = fadd double %23, %24
  store double %25, double* %temp_0, align 8
  %.pre1 = load double* %temp_3, align 8
  %cmp_le = fcmp ole double %25, %.pre1
  store i1 %cmp_le, i1* %temp_4, align 1
  br i1 %cmp_le, label %"i = temp_0", label %end_fun

end_fun:                                          ; preds = %"i = temp_0", %entry
  ret void
}

attributes #0 = { nounwind }

-------------------------------------------------------------------------------------------------------
The loop vectorizer find the loop, but I don't exactly get the trouble with the loop exit count ..

LV: Checking a loop in "loop_ptr_19121568"
LV: Found a loop: i = temp_0
LV: SCEV could not compute the loop exit count.
LV: Not vectorizing.

The SLP vectorizer debug  :

SLP: Vectorizing a list of length = 2.
SLP: Cost of pair:1 Cost of extract:1.
SLP: Vectorizing a list of length = 2.
SLP: Cost of pair:1 Cost of extract:1.
SLP: Found 4 stores to vectorize.
SLP: Vectorizing a list of length = 2.
SLP: Cost of pair:1 Cost of extract:1.
SLP: Vectorizing a list of length = 2.
SLP: Cost of pair:1 Cost of extract:1.
SLP: Found 4 stores to vectorize.


Thanks,
Matthieu


James Courtier-Dutton

unread,
Oct 17, 2013, 4:53:49 AM10/17/13
to Matthieu Dubet, llv...@cs.uiuc.edu
If what you are saying is that you know the array of i32 will always be 10 entries, make the function use a constant limit=10 to the loop.
I.e Make the loop limit a constant and not a variable.



On 11 October 2013 18:27, Matthieu Dubet <maa...@gmail.com> wrote:

Matthieu Dubet

unread,
Oct 17, 2013, 10:15:42 AM10/17/13
to James Courtier-Dutton, llv...@cs.uiuc.edu
Even if I know the size of the array, I'm not always iterating through it entirely so the loop count has to be a variable, but the vectorizer works fine even with a loop limit not constant when compiling C code from Clang for example so I should be able to do the same for this code .. (hopefully :) )

Matthieu
Reply all
Reply to author
Forward
0 new messages