Fairness in Kueue

322 views
Skip to first unread message

Anirudh Jayakumar

unread,
Aug 17, 2022, 11:59:48 PM8/17/22
to wg-batch
Hi,

Hope this is the right place to ask this question. We are exploring building a batch processing system on top of Jobs API and have hit few issues around fairness. I'm trying to understand if Kueue can help with some of these issues. 

1. The first issues is that, a long running job can pretty much keep all other jobs waiting to be scheduled if this job also takes up majority of the resources. We currently spawn these jobs for each tenant in a separate namespace with resource quotas. And typically, each tenant has a mix of jobs in terms of size and running time. 

2. The second issue is similar that, long running low priority jobs can make high priority jobs wait for resources till such low priority jobs complete. 

Does kueue help with these scenarios?

Also, 
3. If I want to define resource quotas a a queue level, should I have 1:1 mapping between cluster-queue and queue?

Thanks,
Anirudh

Aldo Culquicondor

unread,
Aug 18, 2022, 11:10:21 AM8/18/22
to Anirudh Jayakumar, kubernetes-sig-scheduling, wg-batch
Hello Anirudh,
See my answers below.

Hope this is the right place to ask this question.

It sure is :) Also +kubernetes-sig-scheduling.
 
We are exploring building a batch processing system on top of Jobs API and have hit few issues around fairness. I'm trying to understand if Kueue can help with some of these issues. 

1. The first issues is that, a long running job can pretty much keep all other jobs waiting to be scheduled if this job also takes up majority of the resources. We currently spawn these jobs for each tenant in a separate namespace with resource quotas. And typically, each tenant has a mix of jobs in terms of size and running time. 

You can use the queuing strategy that fits you best https://github.com/kubernetes-sigs/kueue/blob/main/docs/concepts/cluster_queue.md#queueing-strategy (probably BestEffortFIFO, according to your description). You can also set priorities.
 
2. The second issue is similar that, long running low priority jobs can make high priority jobs wait for resources till such low priority jobs complete. 

We haven't implemented preemption, but it's next in our roadmap for 2022-H2. https://github.com/kubernetes-sigs/kueue/issues/83 and maybe https://github.com/kubernetes-sigs/kueue/issues/78 is also relevant for you.

Does kueue help with these scenarios?

Also, 
3. If I want to define resource quotas a a queue level, should I have 1:1 mapping between cluster-queue and queue?

Most likely, yes.

You might also want to watch the recording for today's WG-Batch meeting. I don't think the video is up yet, but it will be here: https://www.youtube.com/playlist?list=PL69nYSiGNLP05eEikq0j8PcSehEdM4mj7&jct=s4irZjLuvQ2WrUYAGxyRYjs3a2aysg

Feel free to engage in the github issues so that we can use your feedback in our designs.
Reply all
Reply to author
Forward
0 new messages