Donating the NVIDIA DRA driver for GPUs to Kubernetes

283 views
Skip to first unread message

Kevin Klues

unread,
Feb 23, 2026, 3:35:18 PM (5 days ago) Feb 23
to dev, sig-...@kubernetes.io
Hi SIG-Node Folks,

We are writing to explore the possibility of donating the NVIDIA DRA Driver for GPUs repository to the Kubernetes project under SIG-Node stewardship.

What it is: This is a Dynamic Resource Allocation (DRA) driver that enables flexible GPU allocation and orchestration in Kubernetes.

It provides two primary capabilities:
  • GPU device allocation — dynamic GPU management including support for static MIG partitioning (dynamic MIG partitioning is an alpha feature)
  • ComputeDomains — an abstraction to enable secure isolation for high-bandwidth GPU-GPU memory sharing across multi-node workloads over Multi-Node NVLink (MNNVL)
The project is Apache 2.0 licensed, targets Kubernetes 1.32+, and is actively maintained.

Why SIG-Node: As DRA matures as a core Kubernetes feature, having a well-tested, full featured driver within the Kubernetes org would benefit the broader community by:

  1. Providing a canonical example of a non-trivial DRA driver implementation
  2. Enabling tighter collaboration between the team maintaining this component and upstream Kubernetes maintainers on the DRA API surface and future DRA directions
  3. Broadening the maintainer base and ensuring long-term sustainability
  4. Helping to ensure Kubernetes scheduling and node management fully support GPUs as first-class resources going forward
  5. Serving as a reference implementation for GPU acceleration in the Kubernetes AI conformance program
Current state: The driver has 5 components (2 kubelet plugins, a controller, a dynamically provisioned set of compute-domain daemons, and a webhook), an active CI pipeline, and regular releases. We believe it is mature enough for community-driven development.

We are actively working on options to expand our community owned/driven prow infrastructure to allow for testing the features in this repo as well.

Looking forward to your thoughts.

Thanks,

Kevin Klues
Davanum Srinivas
(with our Nvidia Hats on)

Swati Sehgal

unread,
Feb 25, 2026, 8:11:52 AM (3 days ago) Feb 25
to Kevin Klues, dev, sig-...@kubernetes.io

+1 from me! With DRA now a core Kubernetes feature, this donation would set a strong precedent for how we handle production-grade DRA drivers in the project.

To my knowledge, there hasn’t been an explicit policy against donating vendor device plugins in the past; vendors have generally chosen to maintain them independently. As DRA matures, this seems like a good time to revisit that model.

As discussed in yesterday’s SIG-Node meeting, it will be important to establish clear criteria and expectations for future plugin donations (governance, maintenance commitments, CI standards, conformance, long-term sustainability). 

Overall, I’m supportive of exploring this path and would be happy to help contribute to defining those guidelines.

Thanks,

Swati Sehgal


--
You received this message because you are subscribed to the Google Groups "sig-node" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sig-node+u...@kubernetes.io.
To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-node/CAJR1fVoMqLKLHfp7i4XuKwp2bMpG1YrPoZayavRT1O-NxrqjPw%40mail.gmail.com.

Jack Francis

unread,
Feb 25, 2026, 11:50:02 AM (3 days ago) Feb 25
to sig-node, Swati Sehgal, dev, sig-...@kubernetes.io, Kevin Klues
+1 In particular, this enables a common governance/ownership habitat for drivers across vendors. Love the enhanced ergonomics between concrete vendor drivers and the Kubernetes-owned dra-example-driver effort: this would allow the reference driver effort to provide more standard reusability across a broader set of critical areas (as Swati mentioned CI and other maintenance/sustainability inputs), which should produce lower overall maintenance costs for the DRA driver ecosystem overall.

Kevin Hannon

unread,
Feb 25, 2026, 12:13:41 PM (3 days ago) Feb 25
to sig-node, dev
+1 also.

Kevin also mentioned that the next steps for them would be to figure
out how to test all the hardware that NVIDIA verifies on.
I think this is a good gate on donation generally. There needs to be a
plan on testing and making sure we have the ability to verify hardware
using our CI.
> To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-node/1b0cacee-c4b0-4132-a579-589261039ac6n%40kubernetes.io.

Davanum Srinivas

unread,
Feb 25, 2026, 12:37:24 PM (3 days ago) Feb 25
to Kevin Hannon, sig-node, dev

Sergey Kanzhelev

unread,
Feb 25, 2026, 1:02:04 PM (3 days ago) Feb 25
to Kevin Klues, dev, sig-...@kubernetes.io
Hi,

+1  to the plan. I am very excited to see this happening. I believe SIG Node is a good place to host the GPU DRA driver and we can evolve it alongside the improvements we are making in DRA infrastructure.

As discussed at SIG Node, many questions need answering before the donation actually happens. However, I have not heard any major objections and do not see roadblocks that cannot be solved through discussion. I also strongly encourage vendors using or planning to use the GPU DRA driver to form a working group or participate in the SIG Node CI group to ensure the long term stability and reliability of this new component.

/Sergey


--

Benjamin Elder

unread,
Feb 25, 2026, 2:13:40 PM (3 days ago) Feb 25
to Sergey Kanzhelev, Kevin Klues, dev, sig-...@kubernetes.io
+1, This is great to see :-)
Standard governance and contribution will benefit the ecosystem.


> To my knowledge, there hasn’t been an explicit policy against donating vendor device plugins in the past; vendors have generally chosen to maintain them independently. As DRA matures, this seems like a good time to revisit that model.

Yes, Storage => CSI Drivers (SIG Storage), cloud-provider integrations (SIG Cloud-Provider), and cluster lifecycle tooling (E.G. CAPZ, SIG Cluster Lifecycle) are all existing patterns for this. I think they've worked pretty well.


> There needs to be a plan on testing and making sure we have the ability to verify hardware using our CI.

Yes.
 
In this case we do have limited resources available, but we should be careful about how expensive this could get out of our existing shared resource pools.
Currently, the device-plugin CI (maintained primarily by Dims...) is using the ~cheapest devices in a small project pool and we have incoming requests from AI Conformance as well.


> I also strongly encourage vendors using or planning to use the GPU DRA driver to form a working group or participate in the SIG Node CI group to ensure the long term stability and reliability of this new component.

Please! I suggest starting with the SIG Node CI group or informally coordinating in slack.

Antonio Ojea

unread,
Feb 25, 2026, 6:13:55 PM (3 days ago) Feb 25
to Garfield Heron, dav...@gmail.com, Kevin Hannon, sig-node, dev
Hi everyone,

Just a quick reminder that the community already has clearly defined
rules for repository donations:
https://github.com/kubernetes/community/blob/master/github-management/kubernetes-repositories.md#rules-for-donated-repositories

Kubernetes is a neutral home with a well-defined governance model, and
we should evaluate this donation against the standard requirements we
already have in place for everyone. We should avoid treating this as a
special case or creating an ad-hoc criteria unless there are very
special reasons that I fail to see here.

On Thu, 26 Feb 2026 at 00:01, Garfield Heron <garfiel...@gmail.com> wrote:
>
> donation should be gated on clearly defined criteria around governance, maintainer commitments, hardware backed CI, and conformance expectations so we set a durable precedent for future driver contributions.
> Support moving forward provided we formalize those expectations.
>
> On Wed, Feb 25, 2026, 5:48 PM Garfield Heron <garfiel...@gmail.com> wrote:
>>
>> +1 from me as well.
>>
>> This is a natural evolution as DRA matures into a core feature. Having a production-grade, non-trivial driver in the Kubernetes org strengthens the entire ecosystem.
>>
>> I agree with the points raised around CI and hardware verification with a clear testing plan should be a prerequisite for the donation to proceed. Looking forward to seeing those details from the NVIDIA team.
>>> You received this message because you are subscribed to the Google Groups "dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an email to dev+uns...@kubernetes.io.
>>> To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/dev/CANw6fcGnQVA7-enYk7NPPMnT5PGBX6_gQQ_traVi5%2B-knWiPSA%40mail.gmail.com.
>
> --
> You received this message because you are subscribed to the Google Groups "sig-node" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sig-node+u...@kubernetes.io.
> To view this discussion visit https://groups.google.com/a/kubernetes.io/d/msgid/sig-node/CAAqA%3DuAfreP9j%3DWYvccjGuw52A7RCoQDqisTn6Gv-iSzs3800w%40mail.gmail.com.

Dawn Chen

unread,
Feb 25, 2026, 8:47:22 PM (2 days ago) Feb 25
to Antonio Ojea, Garfield Heron, dav...@gmail.com, Kevin Hannon, sig-node, dev
Sorry for the late chime in. Didn't realize there was one more thread discussing the same donation earlier.

As SIG Node lead, I am super excited about this donation and have no concerns. Looking forward to collaborating more on this project with the Nvidia team. 

The community raised some logistical questions during yesterday's SIG Node weekly, but I don't think they block the donation. Thanks Antonio, for providing the existing policy to guide the donation. The SIG Node and K8s community will work with the Nvidia team to figure out the logistics after the donation.  

Benjamin Elder

unread,
Feb 27, 2026, 1:46:50 PM (18 hours ago) Feb 27
to Mo Khan, sig-node, Dawn Chen, Garfield Heron, dav...@gmail.com, Kevin Hannon, dev, Antonio Ojea, stee...@kubernetes.io, sig-arch...@kubernetes.io
> With my SRC hat on, I believe we should have a security audit done of this driver (and any future donations) before adding it to k-sigs (as doing so implicitly adds it to our bug bounty program).  I plan to discuss this with steering and arch folks at their next meeting.

I think if we make a policy change like that, it must be decided independently and then applied uniformly.
I don't think we can reasonably block a repo donation on a possible project-wide policy change that is undecided let alone undiscussed.

That document falls under the Github Management Subproject of Contributor Experience & the Steering Committee.

----------

We regularly create brand-new repositories in the project:
https://github.com/kubernetes/org/issues?q=is%3Aissue%20label%3Aarea%2Fgithub-repo

If this repository were rewritten from scratch as a new subproject, would it immediately be in scope and bypass a proposed review-before-donation requirement?

I think we should create a policy for opting repositories into the bug bounty program. 
This policy could include requirements such as an audit, and it should cover existing kubernetes-sigs repositories, new repositories created by the project, and donated repositories.

I think that's a separate discussion that should be resolved in parallel.

- Ben, speaking on my own behalf.

On Thu, Feb 26, 2026 at 2:14 PM 'Mo Khan' via sig-node <sig-...@kubernetes.io> wrote:
Hi all,

I have no specific opinion on this driver, but it seems like the community is well in favor of this donation.

With my SRC hat on, I believe we should have a security audit done of this driver (and any future donations) before adding it to k-sigs (as doing so implicitly adds it to our bug bounty program).  I plan to discuss this with steering and arch folks at their next meeting.

Thanks.
Reply all
Reply to author
Forward
0 new messages