Current status of GPU support in Kubernetes

474 views
Skip to first unread message

phv...@gmail.com

unread,
Nov 30, 2016, 4:06:14 AM11/30/16
to Kubernetes user discussion and Q&A
Hi everyone,

We are considering using Kubernetes to manage batch jobs on an on-prem GPU cluster.
However, I had a hard time looking for updated information about GPU support in Kubernetes.

Here it said only 1 GPU per machine is supported: https://github.com/kubernetes/kubernetes/pull/24836
Is it still the case?

We have machines with several GPUs, and if the above information is correct then it would be a blocker for us.

Thanks,

Vishnu Kannan

unread,
Nov 30, 2016, 5:28:51 AM11/30/16
to kubernet...@googlegroups.com, Hui-Zhi Zhao
Kubernetes supports only 1 GPU as of now. Take a look at this doc
This PR is attempting to add support for multiple GPUs. 
We hope to introduce alpha support for multiple GPUs in the next release (v1.6).


--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Hui-Zhi Zhao

unread,
Nov 30, 2016, 6:13:00 AM11/30/16
to Vishnu Kannan, kubernet...@googlegroups.com
Great! I am working on rebase my pr(https://github.com/kubernetes/kubernetes/pull/28216).

Regards,
Hui-Zhi Zhao

On 30 Nov 2016, at 6:28 PM, Vishnu Kannan <vis...@google.com> wrote:

this

Hui-Zhi Zhao

unread,
Dec 4, 2016, 4:38:51 AM12/4/16
to Vishnu Kannan, kubernet...@googlegroups.com
Hi all,

I have rebased #28216, any comment is welcome. Meanwhile, I will add more logs and annotations.


Regards,
Hui-Zhi Zhao

phv...@gmail.com

unread,
Dec 19, 2016, 9:55:38 AM12/19/16
to Kubernetes user discussion and Q&A, vis...@google.com
Awesome!

I checked out the branch, rebase with master, compiled it and deploy on a single-node Ubuntu 14.04 machine.
However, when I run "kubectl describe nodes", it shows:

Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
<removed>

Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
<removed>

I am pretty sure I have 2 GPUs on the machines, with CUDA driver properly installed (I can see nvidia0, nvidia1, nvidiactl, nvidia-uvm under /dev).

I am in the middle of debugging this, but just want to check if anyone experiences something similar?

Regards,
Vu

Vu Pham

unread,
Dec 19, 2016, 11:00:41 AM12/19/16
to kubernet...@googlegroups.com
Nevermind, it's my bad. I was using the wrong binaries :)
It's working now. I am gonna bring it to the cluster.

Looking forward to see new features for GPUs in the codebase. Things like Memory management and so on would be very helpful.

Thanks and cheers,

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/pP3M2k4J0CU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-users+unsubscribe@googlegroups.com.

To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.



--
PHAM Hoai Vu
Message has been deleted

burtc...@gmail.com

unread,
Oct 16, 2017, 3:54:46 AM10/16/17
to Kubernetes user discussion and Q&A

Hi Vu Pham,
My allocatable GPU number is 0.
Can you advise how you overcome the issue?
Thanks, Burt

Vishnu Kannan

unread,
Oct 16, 2017, 5:36:20 PM10/16/17
to Kubernetes user discussion and Q&A
GPU support in kubernetes is moving out of tree using a new extension mechanism called device plugins. Nvidia has published a GPU device plugin recently that is expected to work with kubernetes v1.8. 

If you are on GCP, please reach out to me and I can share an official alpha user guide for GPUs with k8s. 

--

You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.

burtc...@gmail.com

unread,
Oct 17, 2017, 11:27:11 PM10/17/17
to Kubernetes user discussion and Q&A
On Tuesday, October 17, 2017 at 5:36:20 AM UTC+8, Vishnu Kannan wrote:
> GPU support in kubernetes is moving out of tree using a new extension mechanism called device plugins. Nvidia has published a GPU device plugin recently that is expected to work with kubernetes v1.8. 
>
>
> If you are on GCP, please reach out to me and I can share an official alpha user guide for GPUs with k8s. 
>
>
> On Mon, Oct 16, 2017 at 12:54 AM, <burtc...@gmail.com> wrote:
>
>
> Hi Vu Pham,
>
> My allocatable GPU number is 0.
>
> Can you advise how you overcome the issue?
>
> Thanks, Burt
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.
>
> Visit this group at https://groups.google.com/group/kubernetes-users.
>
> For more options, visit https://groups.google.com/d/optout.

Thanks, Vishnu! Following your instructions, there's error messages on the requirements of CUDA 9.0. Mine is CUDA 7.5. In process.
Thank you

kt...@cogent.co.jp

unread,
Oct 23, 2017, 3:48:34 AM10/23/17
to Kubernetes user discussion and Q&A
Hi Vishnu,

I am also trying to run a GPU cluster on GKE. And also hitting allocatable GPU = 0 problem. Can you please guide me?

On Tuesday, October 17, 2017 at 6:36:20 AM UTC+9, Vishnu Kannan wrote:
> GPU support in kubernetes is moving out of tree using a new extension mechanism called device plugins. Nvidia has published a GPU device plugin recently that is expected to work with kubernetes v1.8. 
>
>
> If you are on GCP, please reach out to me and I can share an official alpha user guide for GPUs with k8s. 
>
>
> On Mon, Oct 16, 2017 at 12:54 AM, <burtc...@gmail.com> wrote:
>
>
> Hi Vu Pham,
>
> My allocatable GPU number is 0.
>
> Can you advise how you overcome the issue?
>
> Thanks, Burt
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.

Vishnu Kannan

unread,
Oct 23, 2017, 4:00:29 PM10/23/17
to Kubernetes user discussion and Q&A
Hey there, if you are trying out alpha experience for GPUs on GKE, please sign up via this form and request access to the doc listed in the form. If you have issues with GKE after going through the user guide, please PM me and I'll help you out. 

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Itamar O

unread,
Oct 25, 2017, 10:23:49 AM10/25/17
to kubernet...@googlegroups.com
Vishnu,
I have an alpha cluster on GKE with GPU (project is whitelisted, running 1.7.8), but I am unable to schedule workloads that require GPUs:
unnamed.png

example YAML for reproducing this:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: my-tf
  labels:
    app: tf
spec:
  selector:
    matchLabels:
      app: tf
  replicas: 1
  template:
    metadata:
      labels:
        app: tf
    spec:
      containers:
      - image: tensorflow/tensorflow:latest-gpu
        name: my-tf
        ports:
        - containerPort: 8888
        resources:
          limits:
            alpha.kubernetes.io/nvidia-gpu: 1

Am I missing something?
Thanks.

Vishnu Kannan

unread,
Oct 25, 2017, 10:25:50 AM10/25/17
to Kubernetes user discussion and Q&A
Can you post output of 'kubectl describe no'? Did you follow the user guide to install drivers?

Thanks.

On Mon, Oct 23, 2017 at 11:00 PM 'Vishnu Kannan' via Kubernetes user discussion and Q&A <kubernetes-users@googlegroups.com> wrote:
Hey there, if you are trying out alpha experience for GPUs on GKE, please sign up via this form and request access to the doc listed in the form. If you have issues with GKE after going through the user guide, please PM me and I'll help you out. 
On Mon, Oct 23, 2017 at 12:48 AM, <kt...@cogent.co.jp> wrote:
Hi Vishnu,

I am also trying to run a GPU cluster on GKE. And also hitting allocatable GPU = 0 problem. Can you please guide me?

On Tuesday, October 17, 2017 at 6:36:20 AM UTC+9, Vishnu Kannan wrote:
> GPU support in kubernetes is moving out of tree using a new extension mechanism called device plugins. Nvidia has published a GPU device plugin recently that is expected to work with kubernetes v1.8. 
>
>
> If you are on GCP, please reach out to me and I can share an official alpha user guide for GPUs with k8s. 
>
>
> On Mon, Oct 16, 2017 at 12:54 AM,  <burtc...@gmail.com> wrote:
>
>
> Hi Vu Pham,
>
> My allocatable GPU number is 0.
>
> Can you advise how you overcome the issue?
>
> Thanks, Burt
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.
>
> Visit this group at https://groups.google.com/group/kubernetes-users.
>
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Itamar O

unread,
Oct 25, 2017, 11:11:20 AM10/25/17
to kubernet...@googlegroups.com
I already took that cluster down, so I'll reproduce it later and run `kubectl describe no`.
What user guide are you referring to? I was under the impression that using GKE with the Google optimized images gives me ready-to-go-machines.
The user guide I'm familiar with talks about installing GPU drivers on GCE instances that I spin up on my own, not GKE-managed instances.

Thanks.

Vishnu Kannan

unread,
Oct 25, 2017, 11:38:01 AM10/25/17
to Kubernetes user discussion and Q&A
Clusters are not ready to go just yet. please sign up via this link - https://goo.gl/forms/HR0Upm9w30DW8aIU2

On Oct 25, 2017 8:11 AM, "Itamar O" <itam...@gmail.com> wrote:
I already took that cluster down, so I'll reproduce it later and run `kubectl describe no`.
What user guide are you referring to? I was under the impression that using GKE with the Google optimized images gives me ready-to-go-machines.
The user guide I'm familiar with talks about installing GPU drivers on GCE instances that I spin up on my own, not GKE-managed instances.

Thanks.

On Mon, Oct 23, 2017 at 11:00 PM 'Vishnu Kannan' via Kubernetes user discussion and Q&A <kubernetes-users@googlegroups.com> wrote:
Hey there, if you are trying out alpha experience for GPUs on GKE, please sign up via this form and request access to the doc listed in the form. If you have issues with GKE after going through the user guide, please PM me and I'll help you out. 
On Mon, Oct 23, 2017 at 12:48 AM, <kt...@cogent.co.jp> wrote:
Hi Vishnu,

I am also trying to run a GPU cluster on GKE. And also hitting allocatable GPU = 0 problem. Can you please guide me?

On Tuesday, October 17, 2017 at 6:36:20 AM UTC+9, Vishnu Kannan wrote:
> GPU support in kubernetes is moving out of tree using a new extension mechanism called device plugins. Nvidia has published a GPU device plugin recently that is expected to work with kubernetes v1.8. 
>
>
> If you are on GCP, please reach out to me and I can share an official alpha user guide for GPUs with k8s. 
>
>
> On Mon, Oct 16, 2017 at 12:54 AM,  <burtc...@gmail.com> wrote:
>
>
> Hi Vu Pham,
>
> My allocatable GPU number is 0.
>
> Can you advise how you overcome the issue?
>
> Thanks, Burt
>
>
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-use...@googlegroups.com.
>
> To post to this group, send email to kubernet...@googlegroups.com.
>
> Visit this group at https://groups.google.com/group/kubernetes-users.
>
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Itamar O

unread,
Nov 5, 2017, 3:31:11 AM11/5/17
to kubernet...@googlegroups.com
Took a while, but I'm happy to say that it worked :-)image.png
Thanks!

On Wed, Oct 25, 2017 at 6:37 PM 'Vishnu Kannan' via Kubernetes user discussion and Q&A <kubernet...@googlegroups.com> wrote:
Clusters are not ready to go just yet. please sign up via this link - https://goo.gl/forms/HR0Upm9w30DW8aIU2
On Oct 25, 2017 8:11 AM, "Itamar O" <itam...@gmail.com> wrote:
I already took that cluster down, so I'll reproduce it later and run `kubectl describe no`.
What user guide are you referring to? I was under the impression that using GKE with the Google optimized images gives me ready-to-go-machines.
The user guide I'm familiar with talks about installing GPU drivers on GCE instances that I spin up on my own, not GKE-managed instances.

Reply all
Reply to author
Forward
0 new messages