Interleaved Pipeline Parallel

182 views
Skip to first unread message

Patrick Toulme

unread,
May 30, 2024, 5:13:42 PMMay 30
to OpenXLA Discuss
Hello, 
At the 2024 OpenXLA Dev Day, I watched a presentation by Nvidia where they demonstrated SPMD pipeline parallel. They specifically mentioned interleaved pipelining (looping pipelines). I recall the presenter said they obtained an interleaved schedule via an HLO pass. 

Questions:
1. Is this pass open source and if so where is it located? I looked on OpenXLA and cannot find it. 
2. Is there a recording of this presentation anywhere?

Thank you, 
Patrick Toulme

Amazon Web Services

Peter Hawkins

unread,
May 30, 2024, 5:52:23 PMMay 30
to Patrick Toulme, Abhinav Goel, OpenXLA Discuss

--
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/82dbe6cf-580e-4a45-b95b-01d4501bab87n%40openxla.org.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.

Frederik Gossen

unread,
May 30, 2024, 6:06:08 PMMay 30
to Peter Hawkins, Aditi Joshi, Patrick Toulme, Abhinav Goel, OpenXLA Discuss

+Aditi Joshi can you provide the link to the recording? 

Hi Patrick, 
For the passes involved, you can look for these flags and find the relevant passes, e.g. `PipelinedP2PRewriter` and `CollectivePipeliner`:
 --xla_gpu_enable_latency_hiding_scheduler=true --xla_gpu_collective_permute_decomposer_threshold=1024 --xla_gpu_enable_pipelined_p2p=true
You will also need a few things that have not landed yet, some of which you can find in this draft draft PR 10909
We are running this based on PAXML configs with CIRCULAR_REPEAT >= 2

I will probably land one of these configs soon so that it is easier to reproduce this. 






--

Frederik Gossen

Software Engineer

frgo...@google.com



This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

Patrick Toulme

unread,
May 30, 2024, 6:16:15 PMMay 30
to OpenXLA Discuss, Frederik Gossen, Patrick Toulme, Abhinav Goel, OpenXLA Discuss, Peter Hawkins, Aditi Joshi
Thank you for sending this. Question are you transforming the normal GPipe Schedule with these passes or is the interleaved schedule defined at Jax level? Ie is GPipe Schedule defined at Jax and these passes transform that schedule to interleaved?

Patrick

Frederik Gossen

unread,
May 30, 2024, 6:18:24 PMMay 30
to Patrick Toulme, OpenXLA Discuss, Abhinav Goel, Peter Hawkins, Aditi Joshi
It is currently transformed in XLA with these passes but this might change in the future. Today, JAX does not expose the send/recv primitives that would be needed to express this on the application level. 

Patrick Toulme

unread,
May 30, 2024, 6:20:10 PMMay 30
to OpenXLA Discuss, Frederik Gossen, OpenXLA Discuss, Abhinav Goel, Peter Hawkins, Aditi Joshi, Patrick Toulme
Awesome. I will integrate this in our Neuron PJRT Plugin. Thank you.
Patrick Toulme

Patrick Toulme

unread,
May 30, 2024, 8:25:56 PMMay 30
to OpenXLA Discuss, Patrick Toulme, Frederik Gossen, OpenXLA Discuss, Abhinav Goel, Peter Hawkins, Aditi Joshi
@Frederik I have gone over those passes you linked and searched those env vars, but I do not see the pass in which you are changing the pipeline schedule. I would assume the schedule changing pass would insert new collective permutes or send/receives for the circular repeat. I also do not see a pass or env var in which you define the circular repeat degree. 

I understand you are running the collective permute decomposer pass to obtain send/receives.  I also understand the P2P Rewriter and CollectivePipeliner passes for overlapping collectives with computation. What I am specifically looking for is where the pipeline parallel schedule is changed to an interleaved schedule.  

Thanks for helping and looking forward to your response. 

Best, 
Patrick Toulme

Amazon Web Services

Peter Hawkins

unread,
May 31, 2024, 8:50:37 AMMay 31
to Patrick Toulme, OpenXLA Discuss, Frederik Gossen, Abhinav Goel, Aditi Joshi
Hi..

The compiler doesn't change the pipeline schedule into a circular one, that's done by user code, e.g., here's how PAX does it:

XLA is responsible for getting good communication/compute overlap out of that code.

Peter

Patrick Toulme

unread,
May 31, 2024, 1:53:57 PMMay 31
to OpenXLA Discuss, Peter Hawkins, OpenXLA Discuss, Frederik Gossen, Abhinav Goel, Aditi Joshi, Patrick Toulme
I see. I was looking for a pass to obtain interleaved schedule without altering user code. I suppose I will have to use the Jax code then. One other thing - I noticed a PR that creates valid iterations attribute for collective-permute. So is GPU runtime going to delete the send/receives for the non valid iterations of the padded data? 

https://github.com/openxla/xla/pull/12892

Frederik Gossen

unread,
Jun 3, 2024, 1:42:42 PMJun 3
to Patrick Toulme, OpenXLA Discuss, Peter Hawkins, Abhinav Goel, Aditi Joshi
Yes, that is the idea. It will disable them through the runtime at this point. This transform is unsafe and we're looking into better solutions. 
Reply all
Reply to author
Forward
0 new messages