Hello,
Kubernetes has historically focused on service-type workloads, support for load balancing, traffic splitting, rolling-updates, spreading, autoscaling and topology-aware routing are few examples of features the community built for service workloads. Stateful workloads are also getting more support with the introduction of CSI, topology-aware volume provisioning and storage capacity tracking to mention a few.
However, support for Batch (workloads that run to completion) lagged in Kubernetes core, leading to a challenging migration journey of batch workloads to Kubernetes. Multiple past efforts tried to improve this status, but those efforts lacked continuity, in some cases leading to forked projects outside k8s (including forked schedulers).
Recently, there has been momentum to improve core k8s support for Batch workloads. Examples:
Workload API enhancements (indexed job, suspend jobs, pod deletion cost, accurate job tracking, ready pods tracking in jobs, ttl after finish to GA and CronJob to GA)
NUMA aware scheduling: Topology awareness in Kube-scheduler enhancements#2787
Proposal for job-level management: bit.ly/k8s-job-management and kube-queue
Incubating scheduler plugins for co-scheduling and capacity-scheduling.
To keep the momentum and coordinate efforts, we would like to form the WG Batch.
Here are the answers to the formation questionnaire:
What is the exact problem this group is trying to solve?
Improve the support of batch workloads in Kubernetes core. Some of the limitations of the current architecture include:
The Job API lacks advanced primitives such as completion and retry policies or pod roles (driver and workers) that lead to the creation of heterogeneous third-party APIs.
Quotas are enforced at resource creation time, with no mechanism for queuing, leaving it up to the user to retry.
There is no concept of grouping in kube-scheduler, which could lead to partially-started jobs waiting on other partially-started jobs to get resources.
So far, the stanza has been that these needs can be satisfied with CRDs and third-party controllers. This has led to separate projects, with varying levels of production readiness, that ended up replacing kube-scheduler and/or cluster autoscaler, making it harder for k8s providers to offer full support to batch users.
The Batch working group will help coordinate efforts across SIGs, and align batch related enhancements within k/k. It will include people with expertise and ownership from multiple SIGs and WGs with special investment in Batch. It will work with the broader cloud native community to establish and drive the development of common batch workload support within Kubernetes core
For example, if someone is proposing an enhancement to improve the k8s-slurm integration, then this WG will be the forum to bounce those ideas first, make sure it is aligned with other Batch enhancements and efforts, help shape the proposal to give it the highest chance of it being accepted across the SIGs touching the enhancement.
What is the artifact that this group will deliver, and to whom?
To SIG Apps:
An updated Job API that can fulfill the needs of a wider range of batch applications.
A performant job controller that can scale to thousands of pods per minute.
To SIG Scheduling and autoscaling:
A Queue API, a framework to support different queuing policies and a ready-to-use implementation in a subproject.
Scheduling plugin(s) to support group scheduling that is compatible with cluster-autoscaler.
How does the group know when the problem solving process is completed, and it is time for the Working Group to dissolve?
The group will start working on the deliverables mentioned above. Once the group is satisfied with the shape of Kubernetes to support batch workloads, we will retire the Working Group. Another possibility is that Batch becomes a long term horizontal, in which case we will propose the graduation of the Working Group to a SIG, taking ownership of the APIs and scheduling plugins.
Who are all of the stakeholder SIGs involved in this problem this group is trying to solve?
SIG Apps
SIG Scheduling
SIG Autoscaling
What are the meeting mechanics (frequency, duration, roles)?
The group will meet 1h every two weeks.
The Chair will lead the meeting and go through the agenda items.
The meetings will initially be focused on prioritization and planning and later on technical debate.
Does the goal of the Working Group represent the needs of the project as a whole, or is it focused on the interests of a narrow set of contributors or companies?
The needs are applicable to a wide range of companies.
Who will chair the group, and ensure it continues to meet these requirements?
I nominate Abdullah Gharaibeh as a Chair.
Is diversity well-represented in the Working Group?
--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/03381126-dc43-c9a5-7751-4a0373654ab9%40redhat.com.
The working group by definition doesn't own code, but certainly it is in scope for discussing enhancements related to it which we can then bring to sig-apps.On Mon, Dec 13, 2021 at 6:49 PM Josh Berkus <jbe...@redhat.com> wrote:On 12/13/21 13:36, 'Aldo Culquicondor' via Kubernetes
developer/contributor discussion wrote:
> Looking forward to hearing your thoughts about this proposal.
>
Will WG-Batch become responsible for the current CronJob object?
--
-- Josh Berkus
Kubernetes Community Architect
OSPO, OCTO
--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
+100!It's essential to have a central place to discuss the batch requirements, and come up with generic batch APIs with clear semantics to make the batch behavior consistent and conformant. With the APIs in k/k can make the implementations extensible/pluggable and thus benefit the end-users. I'm looking forward to seeing the batch primitives become the first citizen in the k8s ecosystem.Wei Huang
On Monday, December 13, 2021 at 8:41:26 PM UTC-8 a...@google.com wrote:
The working group by definition doesn't own code, but certainly it is in scope for discussing enhancements related to it which we can then bring to sig-apps.On Mon, Dec 13, 2021 at 6:49 PM Josh Berkus <jbe...@redhat.com> wrote:On 12/13/21 13:36, 'Aldo Culquicondor' via Kubernetes
developer/contributor discussion wrote:
> Looking forward to hearing your thoughts about this proposal.
>
Will WG-Batch become responsible for the current CronJob object?
--
-- Josh Berkus
Kubernetes Community Architect
OSPO, OCTO
--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/03381126-dc43-c9a5-7751-4a0373654ab9%40redhat.com.
--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/2a54fea2-0ae8-400a-b974-084cd2dc081dn%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-dev/NZq744NzwWw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/60a69d8b-ff44-4cea-ad97-5bbc18db1e7dn%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAJi5r2r%3DEFRM1E1wy465KHodJzy6oXBbdkq-KHTJk0hiYpCYZw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/934961ce-b0de-4c12-b8f1-5a3c4efde585n%40googlegroups.com.