[ANNOUNCE] Introducing Node Readiness Controller: Declarative Node Readiness for Kubernetes

33 views
Skip to first unread message

Ajay Sundar

unread,
Feb 4, 2026, 4:55:02 PM (6 days ago) Feb 4
to d...@kubernetes.io, wg-node-...@kubernetes.io, sig-...@kubernetes.io, sig-sch...@kubernetes.io
We are pleased to introduce a new kubernetes-sigs project: Node Readiness Controller.

The Node Readiness Controller provides a fine-grained, declarative scheduling control for nodes. While standard Kubernetes nodes rely on a single node “Ready” condition, modern workloads often require specific infrastructure dependencies - such as CNI agents, storage or device drivers - that need to be fully initialized before they can reliably run. This controller allows operators to define NodeReadinessRules that automatically manage node taints based on the status of specified node conditions.

Key aspects:
  • NodeReadinessRule CRD allows operators to orchestrate multi-step node initialization workflows using rules.
  • Users can choose between bootstrap-only (one-time initialization) or continuous enforcement of the rules.
  • dryRun mode to audit how new readiness requirements will affect your fleet before enforcement.
  • Works out-of-the-box with existing ecosystem components such as Node Problem Detector (NPD) or any daemon reporting custom node conditions.
Find out more:
Documentation: User Guide & Concepts

We are currently in Alpha and looking for early adopters to provide feedback and help us shape the roadmap for future enhancements.

A huge thank you to everyone who contributed and provided feedback to help get this release out!


Best regards,
ajaysundark

Antonio Ojea

unread,
Feb 4, 2026, 5:13:54 PM (6 days ago) Feb 4
to ajaysu...@gmail.com, d...@kubernetes.io, wg-node-...@kubernetes.io, sig-...@kubernetes.io, sig-sch...@kubernetes.io
Great job, node readiness, especially around network readiness has been always a cause of friction for cluster admins and platforms operators, nice to see this project solves that problem.

Congratulations

--
You received this message because you are subscribed to the Google Groups "dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/dev/CAJcs6yTf%2BtJdRwK6%3DhC2%3DbTBSCsF3cOKZaahzcW%2BdZA25pL%3D%2BQ%40mail.gmail.com.

v

unread,
Feb 4, 2026, 9:26:03 PM (6 days ago) Feb 4
to Antonio Ojea, ajaysu...@gmail.com, d...@kubernetes.io, wg-node-...@kubernetes.io, sig-...@kubernetes.io, sig-sch...@kubernetes.io
Congrats!

Would this provide an API for nodes similar to NFD that could integrate with the scheduler? For example, given a set of features (CNI, drivers, or other custom hardware) I may want to find nodes that match, deemed ready for a specific workload use case. A node that is missing a particular feature would not be labeled NotReady in the cluster, but rather just filtered out for the request at hand.

From the post, it sounds like the groups of features are all or nothing, and to achieve the above I would need to partition my cluster into logical groups of features to schedule to.

-Vanessa

You received this message because you are subscribed to the Google Groups "sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sig-node+u...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-node/CAAdXToQb3QAMi5qqVN_CCWjQ80zm4hURviWAj5gt4W-dfUthEA%40mail.gmail.com.

Ajay Sundar

unread,
Feb 5, 2026, 3:27:01 AM (6 days ago) Feb 5
to v, Antonio Ojea, d...@kubernetes.io, wg-node-...@kubernetes.io, sig-...@kubernetes.io, sig-sch...@kubernetes.io
Hi,

Thanks for the great question. This touches on one of the core design goals of the project. 

To clarify, the Node Readiness Controller does not change the node's "Ready" status to False - instead it manages node Taints based on the health of specific components.

Regarding your points,
  • Granularity vs all-or-nothing: While a single NodeReadinessRule evaluates all its conditions as a logical AND, you are not limited to one rule. You can create multiple rule objects, each with a unique taint.
For example,
- Rule A requires 'CNIReady' and applies a 'network-not-ready' taint.
- Rule B requires 'GPUReady' and applies a 'gpu-not-ready' taint.

Because rules use "nodeSelectors", you can apply these readiness gates to specific subsets of nodes (eg., only nodes with GPUs) without needing to logically partition the cluster into separate pools.
  • This effectively provides the selective scheduling (per-workload API) you mentioned. A general workload that only needs the network would wait for the 'network-not-ready' taint to be removed (or would tolerate it), while a GPU-specific workload would need to wait for both taints to clear. This allows nodes to be "filtered" by the scheduler on a per-pod basis using standard taints and tolerations.
  • Comparing to NFD: While NFD is excellent for feature discovery (labeling what a node has), the Node Readiness Controller solves a different problem, focusing on readiness/health ensuring those features are actually functional before the pods land.
We would love to hear more about your specific use cases. If you have ideas or suggestions, please feel free to open an issue or join us on Slack!

Best
ajaysundark
Reply all
Reply to author
Forward
0 new messages