Hi SIG-Node Folks,
We are writing to explore the possibility of donating the
NVIDIA DRA Driver for GPUs repository to the Kubernetes project under SIG-Node stewardship.
What it is: This is a Dynamic Resource Allocation (DRA) driver that enables flexible GPU allocation and orchestration in Kubernetes.
It provides two primary capabilities:
- GPU device allocation — dynamic GPU management including support for static MIG partitioning (dynamic MIG partitioning is an alpha feature)
- ComputeDomains — an abstraction to enable secure isolation for high-bandwidth GPU-GPU memory sharing across multi-node workloads over Multi-Node NVLink (MNNVL)
The project is Apache 2.0 licensed, targets Kubernetes 1.32+, and is actively maintained.
Why SIG-Node: As DRA matures as a core Kubernetes feature, having a well-tested, full featured driver within the Kubernetes org would benefit the broader community by:
- Providing a canonical example of a non-trivial DRA driver implementation
- Enabling tighter collaboration between the team maintaining this component and upstream Kubernetes maintainers on the DRA API surface and future DRA directions
- Broadening the maintainer base and ensuring long-term sustainability
- Helping to ensure Kubernetes scheduling and node management fully support GPUs as first-class resources going forward
- Serving as a reference implementation for GPU acceleration in the Kubernetes AI conformance program
Current state: The driver has 5 components (2 kubelet plugins, a controller, a dynamically provisioned set of compute-domain daemons, and a webhook), an active CI pipeline, and regular releases. We believe it is mature enough for community-driven development.
We are actively working on options to expand our community owned/driven prow infrastructure to allow for testing the features in this repo as well.
Looking forward to your thoughts.
Thanks,
Kevin Klues
Davanum Srinivas
(with our Nvidia Hats on)