Re: Data management in scratchpad memories

8 views
Skip to first unread message

Harish

unread,
Jan 6, 2022, 3:39:05 PM1/6/22
to Polly Development, michae...@meinersbur.de


On Thu, 6 Jan 2022 at 3:55 PM, Harish <haris...@gmail.com> wrote:
Dear Michael,

You have recently answered several of my questions, which helped me understand Polly better. However, I am still a little sceptical about implementing my requirement. I will try to state the requirement and the implementation that I plan to do in this email. It would be beneficial if you could validate this.

Requirement:
       I am working on an architecture that uses scratchpad memory. So I am planning to implement a pass that inserts data transfer instructions(custom) before the computation. Look at the below example where the fun() needs A, B, C arrays. Before starting the computation in fun(), I have the move these arrays accessed regions (e.g., &A[start] to &A[end])  to the scratchpad.
void fun(int *A, int *B, int *C, int start, int end){
// TODO: // 1. Identify the memory region accessed by A, B and C // 2. Move the accessed regions to scratchpad memory for (int i = start; i < end; ++i) { C[i] = A[i] + B[i]; } } void test(){ int start, end; int *A, *B, *C; // Intialize variables fun(A, B, C, start, end); }

Implementation:
To identify the accessed memory references:
1. Identify the bounds (start and end) of the array (I'm using lexmax() and lexmin() to get these values)
2. Get array addresses in those bounds.(&A[start], &A[end])
3. Insert a custom instruction before the computation:
  • The custom instruction takes the start and end address of the array and moves the data in that range to scratchpad memory.
4. Update the uses of array references to scratchpad addresses.

Current status:
With this design, I've started writing a pass. PFA for pass and llvm IR file.
1. I can get the lower and upper bounds of the accessed memory
2. Inserted LLVM IR instructions that give me the bounds as llvm::Values
3. After the pass completion, I'm getting the below errors: 

pass dump: opt --dma-operations parallel.ll

 [p_0, p_1, p_2] -> { MemRef1[i0] : (i0 = 0 and p_0 >= -1 and p_1 >= 2 + p_0) or (i0 = 0 and p_0 <= -2) or (i0 = 0 and p_0 >= -1); MemRef0[i0] : (p_0 >= -1 and i0 > p_0 + 4p_2 and i0 < p_1 + 4p_2) or (i0 = p_0 + 4p_2 and p_0 <= -2) or (i0 = p_0 + 4p_2 and p_0 >= -1); MemRef2[i0] : (p_0 >= -1 and i0 > p_0 + 4p_2 and i0 < p_1 + 4p_2) or (i0 = p_0 + 4p_2 and p_0 <= -2) or (i0 = p_0 + 4p_2 and p_0 >= -1) }
 
[p_0, p_1, p_2] -> { [(0)] : p_0 <= -2 or p_0 >= -1 }
i64 0
[p_0, p_1, p_2] -> { [(0)] : p_0 <= -2 or p_0 >= -1 }
i64 0
 
[p_0, p_1, p_2] -> { [(p_0 + 4p_2)] : p_0 <= -2 or p_0 >= -1 }
  %20 = add nsw i64 %19, %18
[p_0, p_1, p_2] -> { [(-1 + p_1 + 4p_2)] : p_0 >= -1 and p_1 >= 2 + p_0; [(p_0 + 4p_2)] : p_0 <= -2 or (p_0 >= -1 and p_1 <= 1 + p_0) }
  %37 = select i1 %27, i64 %32, i64 %36
 
[p_0, p_1, p_2] -> { [(p_0 + 4p_2)] : p_0 <= -2 or p_0 >= -1 }
  %41 = add nsw i64 %40, %39
[p_0, p_1, p_2] -> { [(-1 + p_1 + 4p_2)] : p_0 >= -1 and p_1 >= 2 + p_0; [(p_0 + 4p_2)] : p_0 <= -2 or (p_0 >= -1 and p_1 <= 1 + p_0) }
  %58 = select i1 %48, i64 %53, i64 %57
 
Stack dump:
0. Program arguments: opt --dma-operations parallel.ll
1. Running pass 'Function Pass Manager' on module 'parallel.ll'.
2. Running pass 'Region Pass Manager' on function '@_pocl_kernel_matvec_mult_ceWork'
3. Releasing pass 'Polly - Create polyhedral description of Scops'
malloc_consolidate(): invalid chunk size
Aborted (core dumped)
 
Questions:
1. Is Polly the right candidate for this kind of work?
2. Are there any existing open-source frameworks that do this kind of work?
3. Is this implementation feasible/correct?
4. What I'm doing wrong in the pass that's giving me the error?


Thanks,
Harish C

Sven Verdoolaege

unread,
Jan 15, 2022, 10:20:38 AM1/15/22
to Harish, Polly Development, michae...@meinersbur.de
On Fri, Jan 07, 2022 at 02:08:52AM +0530, Harish wrote:
> > To identify the accessed memory references:
> > 1. Identify the bounds (start and end) of the array (I'm using lexmax()
> > and lexmin() to get these values)

These may produce complicated expressions.
In my experience, you're usually better off computing simple approximations.
You may transfer a few more elements, but won't waste time
evaluating the complicated expressions.

> >> Stack dump:
> >> 0. Program arguments: opt --dma-operations parallel.ll
> >> 1. Running pass 'Function Pass Manager' on module 'parallel.ll'.
> >> 2. Running pass 'Region Pass Manager' on function
> >> '@_pocl_kernel_matvec_mult_ceWork'
> >> 3. Releasing pass 'Polly - Create polyhedral description of Scops'
> >> malloc_consolidate(): invalid chunk size

This looks like memory corruption.
You may have a double free or something similar in your code.

> > *Questions*:
> > 1. Is Polly the right candidate for this kind of work?

Sounds reasonable.

> > 2. Are there any existing open-source frameworks that do this kind of work?

Probably, but if you only want this bit, then it's probably
going to be easier to implement it yourself than to extract
it from somewhere else.

> > 3. Is this implementation feasible/correct?

You haven't shown your implementation.
(Not that I would be able to tell.)

skimo
Reply all
Reply to author
Forward
0 new messages