[KEP] Gang Scheduling Support for LWS

24 views
Skip to first unread message

Chen Zicong

unread,
Apr 10, 2025, 8:12:28 AMApr 10
to wg-serving
Hi all,

I've submitted a new KEP PR (#496) to add gang scheduling support for LWS. This feature has been frequently requested by users to address scheduling deadlocks in resource-constrained environments.

Currently, when cluster resources are limited, the scheduler may schedule leader pods while leaving worker pods pending indefinitely. This leads to inference services being unavailable despite consuming cluster resources.

Gang scheduling ensures all pods in a replica are scheduled together or not at all, preventing resource waste and maintaining service availability. More importantly, this enhancement enables LWS to integrate with popular custom schedulers like Volcano, coscheduling scheduler-plugins, YuniKorn, etc., expanding the LWS ecosystem to meet diverse scheduling requirements.

I hope we can get this KEP reviewed soon to help LWS quickly integrate gang scheduling capabilities and grow its ecosystem, as many users have expressed interest in this feature.

Looking forward to your feedback! 

PR: https://github.com/kubernetes-sigs/lws/pull/496

Thanks,
Zicong
Reply all
Reply to author
Forward
0 new messages