Re: [jupyter] Kubernetes NVIDIA GPU/extraVolumeMount issues

Message has been deleted

Chia-liang Kao

unread,

Sep 13, 2018, 1:05:39 PM9/13/18

to jup...@googlegroups.com

Hi,

1. for user home pvc, make sure you have correct fsGid configured. if you use docker-stack (jupyter/*) based notebook, it should also try properly to chown the user home directory before su into the jovyan user.

2. is your single user image with the tensorflow-gpu or tensorflow package? beware that conda can pull non-gpu version from mixed channels even if you specifically install tensorflow-gpu.

3. limit: 0 does not take away GPUs. you need to configure NVIDIA_VISIBLE_DEVICES=none as extra env in this case.

Best,

clkao

Benedikt Bäumle <benedikt...@gmail.com> 於 2018年9月13日週四下午6:53寫道：

Hey guys,

I am currently setting up a Kubernetes bare-metal single node cluster + Jupyterhub for having control over resources for our users. I use Helm to set up jupyterhub with a custom singleuser-notebook image for deep learning.

The idea is to set up the hub to have better control over NVIDIA GPUs on the server.

I am struggling with some things I can't figure out how to do or if that is even possible:

1. I mount the home directory of the user to the notebook user ( in our case /home/dbvis/ ) in the helm chart values.yaml:

extraVolumes:
- name: home
hostPath:
path: /home/{username}
extraVolumeMounts:
- name: home
mountPath: /home/dbvis/data

It is indeed mounted like this, but with root:root ownership and I can't add/remove/change anything inside the container at /home/dbvis/data. What is tried out:

- I tried to change the ownership in the Dockerfile by running 'chown -R dbvis:dbvis /home/dbvis/' in the end as root user
- I tried through the following postStart hook in the values.yaml

lifecycleHooks:
postStart:
exec:
command: ["chown", "-R", "dbvis:dbvis", "/home/dbvis/data"]

Both didn't work...as storageclass I set up rook with rook-ceph-block storage.
Any ideas?

2. We have several NVIDIA GPUs and I would like to control them and set limits for the jupyter singelser-notebooks. I set up 'nvidia device plugin' ( https://github.com/NVIDIA/k8s-device-plugin ).
When I use 'kubectl describe node' I find the GPU as resource:

Allocatable:
cpu: 16
ephemeral-storage: 189274027310
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 98770548Ki
nvidia.com/gpu: 1
pods: 110
...
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2250m (14%) 4100m (25%)
memory 2238Mi (2%) 11146362880 (11%)
nvidia.com/gpu 0 0
Events: <none>
Inside the jupyter singleuser-notebooks I can see the GPU, when executing 'nvidia-smi'.
But if I run e.g. tensorflow to see the GPU with the following code:
from tensorflow.python.client import device_lib

device_lib.list_local_devices()
I just get the CPU device:
[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 232115754901553261]
Any idea what I am doing wrong?

Further, I would like to limit the amount of GPUs ( It is just a test environment with one GPU we have more ). I tried the following which doesn't seem to have an effect:

- Add the following config in values.yaml in any combination possible:
extraConfig: |
c.Spawner.notebook_dir = '/home/dbvis'
c.Spawner.extra_resource_limits: {'nvidia.com/gpu': '0'}
c.Spawner.extra_resource_guarantees: {'nvidia.com/gpu': '0'}
c.Spawner.args = ['--device=/dev/nvidiactl', '--device=/dev/nvidia-uvm', '--device=/dev/nvidia-uvm-tools', '/dev/nvidia0' ]

- Add the GPU to the resources in the singleuser configuration in values.yaml:

singleuser:
image:
name: benne4444/dbvis-singleuser
tag: test3
nvidia.com/gpu:
limit: 1
guarantee: 1

Is what I am trying even possible right now?

Further information:

I set up a server running

- Ubuntu 18.04.1 LTS
- docker-nvidia
- helm jupyterhub version 0.8-ea0cf9a

I added the complete values.yaml.

If you need additional information please let me know. Any help is appreciated a lot.

Thank you,
Benedikt
--
You received this message because you are subscribed to the Google Groups "Project Jupyter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter+u...@googlegroups.com.
To post to this group, send email to jup...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter/585d4d0b-5d8d-4cf2-b109-2c16f93d1f62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Benedikt Bäumle

unread,

Sep 26, 2018, 1:20:34 PM9/26/18

to Project Jupyter

Fixed. Can be deleted.

Am Donnerstag, 13. September 2018 19:05:39 UTC+2 schrieb Chia-liang Kao:

Hi,

1. for user home pvc, make sure you have correct fsGid configured. if you use docker-stack (jupyter/*) based notebook, it should also try properly to chown the user home directory before su into the jovyan user.

2. is your single user image with the tensorflow-gpu or tensorflow package? beware that conda can pull non-gpu version from mixed channels even if you specifically install tensorflow-gpu.

Jupyter Notebook didn't give me any log messages. Having a look on the logs in a python terminal showed me that my test graphic card was not compatible.

3. limit: 0 does not take away GPUs. you need to configure NVIDIA_VISIBLE_DEVICES=none as extra env in this case.

The incompatibility of my graphic card was also the problem.

Reply all

Reply to author

Forward