Hey guys,
I am currently setting up a Kubernetes bare-metal single node cluster + Jupyterhub for having control over resources for our users. I use Helm to set up jupyterhub with a custom singleuser-notebook image for deep learning.
The idea is to set up the hub to have better control over NVIDIA GPUs on the server.
I am struggling with some things I can't figure out how to do or if that is even possible:
1. I mount the home directory of the user to the notebook user ( in our case /home/dbvis/ ) in the helm chart values.yaml:
extraVolumes:
- name: home
hostPath:
path: /home/{username}
extraVolumeMounts:
- name: home
mountPath: /home/dbvis/data
It is indeed mounted like this, but with root:root ownership and I can't add/remove/change anything inside the container at /home/dbvis/data. What is tried out:
- I tried to change the ownership in the Dockerfile by running 'chown -R dbvis:dbvis /home/dbvis/' in the end as root user
- I tried through the following postStart hook in the values.yaml
lifecycleHooks:
postStart:
exec:
command: ["chown", "-R", "dbvis:dbvis", "/home/dbvis/data"]
Both didn't work...as storageclass I set up rook with rook-ceph-block storage.
When I use 'kubectl describe node' I find the GPU as resource:
Allocatable:
cpu: 16
ephemeral-storage: 189274027310
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 98770548Ki
pods: 110
...
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2250m (14%) 4100m (25%)
memory 2238Mi (2%) 11146362880 (11%)
Events: <none>
Inside the jupyter singleuser-notebooks I can see the GPU, when executing 'nvidia-smi'.
But if I run e.g. tensorflow to see the GPU with the following code:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
I just get the CPU device:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 232115754901553261]
Any idea what I am doing wrong?
Further, I would like to limit the amount of GPUs ( It is just a test environment with one GPU we have more ). I tried the following which doesn't seem to have an effect:
- Add the following config in values.yaml in any combination possible:
extraConfig: |
c.Spawner.notebook_dir = '/home/dbvis'
c.Spawner.args = ['--device=/dev/nvidiactl', '--device=/dev/nvidia-uvm', '--device=/dev/nvidia-uvm-tools', '/dev/nvidia0' ]
- Add the GPU to the resources in the singleuser configuration in values.yaml:
singleuser:
image:
name: benne4444/dbvis-singleuser
tag: test3
limit: 1
guarantee: 1
Is what I am trying even possible right now?
Further information:
I set up a server running
- Ubuntu 18.04.1 LTS
- docker-nvidia
- helm jupyterhub version 0.8-ea0cf9a
I added the complete values.yaml.
If you need additional information please let me know. Any help is appreciated a lot.
Thank you,
Benedikt