void fun(int *A, int *B, int *C, int start, int end){
// TODO:
// 1. Identify the memory region accessed by A, B and C
// 2. Move the accessed regions to scratchpad memory
for (int i = start; i < end; ++i) {
C[i] = A[i] + B[i];
}
}
void test(){
int start, end;
int *A, *B, *C;
// Intialize variables
fun(A, B, C, start, end);
}
Implementation
:
To identify the accessed memory references:
1. Identify the bounds (start and end) of the array (I'm using lexmax() and lexmin() to get these values)
2. Get array addresses in those bounds.(&A[start], &A[end])
3. Insert a custom instruction before the computation:
- The custom instruction takes the start and end address of the array and moves the data in that range to scratchpad memory.
4. Update the uses of array references to scratchpad addresses.
Current status:
With this design, I've started writing a pass. PFA for pass and llvm IR file.
1. I can get the lower and upper bounds of the accessed memory
2. Inserted LLVM IR instructions that give me the bounds as llvm::Values
3. After the pass completion, I'm getting the below errors:
pass dump: opt --dma-operations parallel.ll
[p_0, p_1, p_2] -> { MemRef1[i0] : (i0 = 0 and p_0 >= -1 and p_1 >= 2 + p_0) or (i0 = 0 and p_0 <= -2) or (i0 = 0 and p_0 >= -1); MemRef0[i0] : (p_0 >= -1 and i0 > p_0 + 4p_2 and i0 < p_1 + 4p_2) or (i0 = p_0 + 4p_2 and p_0 <= -2) or (i0 = p_0 + 4p_2 and p_0 >= -1); MemRef2[i0] : (p_0 >= -1 and i0 > p_0 + 4p_2 and i0 < p_1 + 4p_2) or (i0 = p_0 + 4p_2 and p_0 <= -2) or (i0 = p_0 + 4p_2 and p_0 >= -1) }
[p_0, p_1, p_2] -> { [(0)] : p_0 <= -2 or p_0 >= -1 }
i64 0
[p_0, p_1, p_2] -> { [(0)] : p_0 <= -2 or p_0 >= -1 }
i64 0
[p_0, p_1, p_2] -> { [(p_0 + 4p_2)] : p_0 <= -2 or p_0 >= -1 }
%20 = add nsw i64 %19, %18
[p_0, p_1, p_2] -> { [(-1 + p_1 + 4p_2)] : p_0 >= -1 and p_1 >= 2 + p_0; [(p_0 + 4p_2)] : p_0 <= -2 or (p_0 >= -1 and p_1 <= 1 + p_0) }
%37 = select i1 %27, i64 %32, i64 %36
[p_0, p_1, p_2] -> { [(p_0 + 4p_2)] : p_0 <= -2 or p_0 >= -1 }
%41 = add nsw i64 %40, %39
[p_0, p_1, p_2] -> { [(-1 + p_1 + 4p_2)] : p_0 >= -1 and p_1 >= 2 + p_0; [(p_0 + 4p_2)] : p_0 <= -2 or (p_0 >= -1 and p_1 <= 1 + p_0) }
%58 = select i1 %48, i64 %53, i64 %57
Stack dump:
0. Program arguments: opt --dma-operations parallel.ll
1. Running pass 'Function Pass Manager' on module 'parallel.ll'.
2. Running pass 'Region Pass Manager' on function '@_pocl_kernel_matvec_mult_ceWork'
3. Releasing pass 'Polly - Create polyhedral description of Scops'
malloc_consolidate(): invalid chunk size
Aborted (core dumped)
Questions: