On Mon, Dec 02, 2024 at 06:13:27PM -0800, Zheng Qihang wrote:
> Actually, I want to separate a contiguous domain by `parallel` parameters
> to fit the distributed memory architecture.
> For example, given a domain `{ [d3]: 0 <= d3 < 1024 }` that will be
> separated among 8 devices, then each device owns:
>
> | 0..128 | 128..256 | 256..384 | 384..512 | 512..640 | 640..768 | 768..896
> | 896..1024 |
> | tid 0 | tid 1 | tid 2 | tid 3 | tid 4
> | tid 5 | tid 6 | tid 7 |
>
> so the relation between tid and domain is `{ [tid] -> [td3] : 0 *<=* tid
> *<=* 7 *and* 128tid *<=* td3 *<=* 127 *+* 128tid }`
>
> I tried this transformation :
> ```
> parallel = 8
> bounds = 1024
> tile = bounds//parallel
> domainA = isl.set(f"{{ [d3]: 0 <= d3 < {bounds} }}")
> splitA = isl.map(f"{{ [tid] -> [td3]: 0 <= tid < {parallel} and {tile}*tid
> <= td3 < {tile}*(tid+1) }}")
>
> splitA Result : isl.map("{ [tid] -> [td3] : 0 <= tid <= 7 and 128tid <= td3
> <= 127 + 128tid }")
>
> ```
> But this one requires that the `tile` factor be computed on the Python
> side.
Why is computing the tile size in Python a problem?
Note that you could use
f"{{ [tid] -> [td3=0:{bounds}-1]: tid = td3//{tile} }}"
so that you don't need to specify both parallel and tile.
> It might not be a common solution.
If you are worried about cases where the size is not divisible
by the number of devices, you could just use
tile = (bounds+parallel-1)//parallel
skimo