To summarize my comments above--making a device (weather GPU or, the case that I am interested in, storage) available to an end user requires:
I may be missing a lot of context since I don't attend the resource management meetings, but, so far this proposal does not clarify how each of these steps will happen.
From the storage perspective I want to make sure this aligns with our plans for local storage and CSI. What complicates that is that Storage has it's own API on the k8s side for requesting storage resources, and on the plugin side for provisioning/making them available (CSI).
CC @kubernetes/sig-storage-misc, @msau42
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
Hi @saad-ali !
Thanks for the feedback it looks like my design doc is not clear enough :)
- kubelet discovery and advertising of the device and it's capacity and other parameters.
- kubelet marking the device resource as no longer available
- kubelet making the device available inside pod containers and cleaning up when pod is terminated.
This design doc is supposed to answer to all the above points and partially the below points :)
I'll be updating the design doc later this week to try and clarify the points that you mentioned.
Basically this proposal describes a plugin mechanism for advertising vendor specific resources (and their properties).
These are advertised through vendor specific "device-plugins" which implements a gRPC protocol allowing:
Discover callAllocate and Deallocate callsMonitor call
- kubelet marking the device resource as no longer available
Resource management will be handled by Kubelet at the kuberuntime level
furthermore as the user you will be able to see the devices available in a new field
added to the nodeStatus struct.
- pod requesting min/max quantity of device resource
- scheduler awareness of devices resource
Most of the scheduling work is supposed to be solved by the ResourceClass proposal.
We were thinking however of using OIR's implementation to advertise the resources
through something like extensions.kubernetes.io/my-device in the pod spec.
And then have the ResourceClass replace it.
Does that answer your questions :D ?
@balajismaniam commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> + 8. [Device Plugin](#device-plugin) + * [Protocol Overview](#protocol-overview) + * [Protobuf specification](#protobuf-specification) + * [Installation](#installation) + * [API Changes](#api-changes) + * [Versioning](#versioning) + +_Authors:_ + +* @RenaudWasTaken - Renaud Gaubert <rgau...@nvidia.com> + +## Abstract + +This document describes a vendor independant solution to: + * Discovering and representing external devices + * Making available to the container and cleaning up these devices
Nit: rephrase as Making these devices available to the container and cleaning them up afterwards
In contributors/design-proposals/device-plugin.md:
> +## Abstract + +This document describes a vendor independant solution to: + * Discovering and representing external devices + * Making available to the container and cleaning up these devices + * Health Check of these devices + +Because devices are vendor dependant and have their own sets of problems +and mechanisms, the solution we describe is a plugin mechanism managed by +Kubelet. + +At their core, device plugins are simple gRPC servers implementing the +gRPC interface defined later in this design document. Once the device +plugin makes itself know to kubelet, it will interact with the device +plugin through this interface: + 1. A `Discover` function for the Kubeket to Discover the devices and
s/Kubeket/kubelet
In contributors/design-proposals/device-plugin.md:
> + +_GPU Integration Example:_ + * [Enable "kick the tires" support for Nvidia GPUs in COS](https://github.com/kubernetes/kubernetes/pull/45136) + * [Extend experimental support to multiple Nvidia GPUs](https://github.com/kubernetes/kubernetes/pull/42116) + +_Kubernetes Meeting Notes On This:_ + * [Meeting notes](https://docs.google.com/document/d/1Qg42Nmv-QwL4RxicsU2qtZgFKOzANf8fGayw8p3lX6U/edit#) + * [Better Abstraction for Compute Resources in Kubernetes](https://docs.google.com/document/d/1666PPUs4Lz56TqKygcy6mXkNazde-vwA7q4e5H92sUc) + * [Extensible support for hardware devices in Kubernetes (join kuberne...@googlegroups.com for access)](https://docs.google.com/document/d/1LHeTPx_fWA1PdZkHuALPzYxR0AYXUiiXdo3S0g2VSlo/edit) + +## Use Cases + + * I want to use a particular device type (GPU, InfiniBand, FPGA, etc.) + in my pod. + * I should be able to use that device without writing custom Kubernetes code. + * I want a consistent and portable solution to consuming hardware devices
Nit: s/consuming/consume
In contributors/design-proposals/device-plugin.md:
> + + * I want to use a particular device type (GPU, InfiniBand, FPGA, etc.) + in my pod. + * I should be able to use that device without writing custom Kubernetes code. + * I want a consistent and portable solution to consuming hardware devices + across k8s clusters + +## Objectives + +1. Add support for vendor specific Devices in kubelet: + * Through a pluggable mechanism + * Which allows discovery and monitoring of devices + * Which allows hooking the runtime to make devices available in containers + and cleaning them up. +2. Define a deployment mechanism for this new API +3. Define a versioning mechanism oor this new API
s/oor/for
In contributors/design-proposals/device-plugin.md:
> + * I want to use a particular device type (GPU, InfiniBand, FPGA, etc.) + in my pod. + * I should be able to use that device without writing custom Kubernetes code. + * I want a consistent and portable solution to consuming hardware devices + across k8s clusters + +## Objectives + +1. Add support for vendor specific Devices in kubelet: + * Through a pluggable mechanism + * Which allows discovery and monitoring of devices + * Which allows hooking the runtime to make devices available in containers + and cleaning them up. +2. Define a deployment mechanism for this new API +3. Define a versioning mechanism oor this new API +t
Delete this.
In contributors/design-proposals/device-plugin.md:
> + * I want a consistent and portable solution to consuming hardware devices + across k8s clusters + +## Objectives + +1. Add support for vendor specific Devices in kubelet: + * Through a pluggable mechanism + * Which allows discovery and monitoring of devices + * Which allows hooking the runtime to make devices available in containers + and cleaning them up. +2. Define a deployment mechanism for this new API +3. Define a versioning mechanism oor this new API +t + +## Non Objectives +1. Advanced scheduling and resource selection (solved through [#782](https://github.com/kubernetes/community/pull/782))
Nit: Missing period at the end of this sentence.
In contributors/design-proposals/device-plugin.md:
> + +## Objectives + +1. Add support for vendor specific Devices in kubelet: + * Through a pluggable mechanism + * Which allows discovery and monitoring of devices + * Which allows hooking the runtime to make devices available in containers + and cleaning them up. +2. Define a deployment mechanism for this new API +3. Define a versioning mechanism oor this new API +t + +## Non Objectives +1. Advanced scheduling and resource selection (solved through [#782](https://github.com/kubernetes/community/pull/782)) + We will only try to give basic selection primitives to the devices +2. Metrics this should be the job of cadvisor and should probably either be
s/Metric this/Metrics: This
In contributors/design-proposals/device-plugin.md:
> + +Finally, to notify Kubelet of the existence of the device plugin, +the vendor's device plugin will have to make a request to Kubelet's +onwn gRPC server. +Only then will kubelet start interacting with the vendor's device plugin +through the gRPC apis. + +### End User story + +When setting up the cluster the admin knows what kind of devices are present +on the different machines and therefore can select what devices they want to +enable. + +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys +the nvidia device plugin through: +`kubectl create -f nvidia.io/device-plugin.yml`
Is there recommended way to deploy these DaemonSets (e.g., using node-feature-discovery)?
In contributors/design-proposals/device-plugin.md:
> + +### End User story + +When setting up the cluster the admin knows what kind of devices are present +on the different machines and therefore can select what devices they want to +enable. + +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys +the nvidia device plugin through: +`kubectl create -f nvidia.io/device-plugin.yml` + +The device plugin lands on all the nodes of the cluster and if it detects that +there are no GPUs it terminates. However, when there are GPUs it reports them +to Kubelet. +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method.
Can non-gpu devices be advertised using the above DaemonSet mechanism? Why should OIRs be used? Is this done to maintain backward compatibility?
In contributors/design-proposals/device-plugin.md:
> + string Kind = 2; + string Vendor = 4; + string Health = 3; +} +``` + +## Installation + +The installation process should be straightforward to the user, transparent +and similar to other regular Kubernetes actions. +The device plugin should also run in containers so that kubernetes can +deploy them and restart the plugins when they fail. +However, we should not prevent the user from deploying a bare metal device +plugin. + +Deploying the device plugins though DemonSets makes sense as the cluster
s/though/through
In contributors/design-proposals/device-plugin.md:
> + Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
+}
+```
+
+Because the current API (Capacity) can not be extended to support Device,
+we will need to create two new attributes in the NodeStatus structure:
+ * `DevCapacity`: Describing the device capacity of the node
+ * `DevAvailable`: Describing the available devices
+
+```golang
+type NodeStatus struct {
+ DevCapacity []Device
+ DevAvailable []Device
Having Available and Allocatable is confusing. Is a mechanism similar to OIR to advertise Capacity and Allocatable considered already? May be relaxing the OIR prefix/namespace requirements? (See: https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/helper/helpers.go#L34)
@RenaudWasTaken commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> + +Finally, to notify Kubelet of the existence of the device plugin, +the vendor's device plugin will have to make a request to Kubelet's +onwn gRPC server. +Only then will kubelet start interacting with the vendor's device plugin +through the gRPC apis. + +### End User story + +When setting up the cluster the admin knows what kind of devices are present +on the different machines and therefore can select what devices they want to +enable. + +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys +the nvidia device plugin through: +`kubectl create -f nvidia.io/device-plugin.yml`
Can you develop your question ? What do you mean by recommended way to deploy a DaemonSet ?
The DaemonSet is the deployment mechanism :)
> + Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
+}
+```
+
+Because the current API (Capacity) can not be extended to support Device,
+we will need to create two new attributes in the NodeStatus structure:
+ * `DevCapacity`: Describing the device capacity of the node
+ * `DevAvailable`: Describing the available devices
+
+```golang
+type NodeStatus struct {
+ DevCapacity []Device
+ DevAvailable []Device
As mentioned on the previous comment Allocatable != Available: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md
DevAvailable should not be named DevAllocatable because it does not do the same thing
> + +### End User story + +When setting up the cluster the admin knows what kind of devices are present +on the different machines and therefore can select what devices they want to +enable. + +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys +the nvidia device plugin through: +`kubectl create -f nvidia.io/device-plugin.yml` + +The device plugin lands on all the nodes of the cluster and if it detects that +there are no GPUs it terminates. However, when there are GPUs it reports them +to Kubelet. +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method.
Can non-gpu devices be advertised using the above DaemonSet mechanism?
I don't understand your question about DaemonSet ? DaemonSet is just a way to deploy a Device Plugin. The plugin then advertise the resource to Kubelet.
Currently the idea is to have GPUs advertised by the Nvidia device plugin to be advertised by the node as "alpha.kubernetes.io/nvidia-gpu" and all other devices as "pod.alpha.kubernetes.io/opaque-int-resource-my-device".
We are thinking of actually changing the name to "extensions.kubernetes.io/my-device" but this is really another discussion :)
Thanks @balajismaniam for noticing all these errors :)
@vikaschoudhary16 commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> + +Finally, to notify Kubelet of the existence of the device plugin, +the vendor's device plugin will have to make a request to Kubelet's +onwn gRPC server. +Only then will kubelet start interacting with the vendor's device plugin +through the gRPC apis. + +### End User story + +When setting up the cluster the admin knows what kind of devices are present +on the different machines and therefore can select what devices they want to +enable. + +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys +the nvidia device plugin through: +`kubectl create -f nvidia.io/device-plugin.yml`
@balajismaniam
IIUC, user is expected to deploy daemonset manually and once deployed, vendor device plugin will do the discovery. Deploying daemonset is the 0th step. @RenaudWasTaken can correct me, if i said wrong.
@vikaschoudhary16 commented on this pull request.
> + +### End User story + +When setting up the cluster the admin knows what kind of devices are present +on the different machines and therefore can select what devices they want to +enable. + +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys +the nvidia device plugin through: +`kubectl create -f nvidia.io/device-plugin.yml` + +The device plugin lands on all the nodes of the cluster and if it detects that +there are no GPUs it terminates. However, when there are GPUs it reports them +to Kubelet. +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method.
Why should OIRs be used?
OIRs are being used to get the scheduling done. In the future, when resource classes are available and are mature enough, OIRs will be deprecated.
@balajismaniam commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> + +### End User story + +When setting up the cluster the admin knows what kind of devices are present +on the different machines and therefore can select what devices they want to +enable. + +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys +the nvidia device plugin through: +`kubectl create -f nvidia.io/device-plugin.yml` + +The device plugin lands on all the nodes of the cluster and if it detects that +there are no GPUs it terminates. However, when there are GPUs it reports them +to Kubelet. +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method.
Thanks for the clarification. Making this (i.e., OIRs will be deprecated) clear in the proposal might be good.
> + Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
+}
+```
+
+Because the current API (Capacity) can not be extended to support Device,
+we will need to create two new attributes in the NodeStatus structure:
+ * `DevCapacity`: Describing the device capacity of the node
+ * `DevAvailable`: Describing the available devices
+
+```golang
+type NodeStatus struct {
+ DevCapacity []Device
+ DevAvailable []Device
I understand the difference between Allocatable and Available. I was alluding to confusion caused by introducing Available itself.
@ConnorDoyle commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> + Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
+}
+```
+
+Because the current API (Capacity) can not be extended to support Device,
+we will need to create two new attributes in the NodeStatus structure:
+ * `DevCapacity`: Describing the device capacity of the node
+ * `DevAvailable`: Describing the available devices
+
+```golang
+type NodeStatus struct {
+ DevCapacity []Device
+ DevAvailable []Device
FWIW, I think finding a way to do without API changes to the node spec would help sharpen this proposal to focus on the device plugin system. Of course there needs to be a way to schedule against the device resources, but could something simpler fork for the first alpha?
@derekwaynecarr commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> + +This document describes a vendor independant solution to: + * Discovering and representing external devices + * Making these devices available to the container and cleaning them up + afterwards + * Health Check of these devices + +Because devices are vendor dependant and have their own sets of problems +and mechanisms, the solution we describe is a plugin mechanism managed by +Kubelet. + +At their core, device plugins are simple gRPC servers that may run in a +container deployed through the pod mechanism. + +These servers implement the gRPC interface defined later in this design +document and once the device plugin makes itself know to kubelet, kubelet
nit: known to kubelet,
> + * [Protobuf specification](#protobuf-specification) + * [Installation](#installation) + * [API Changes](#api-changes) + * [Versioning](#versioning) + +_Authors:_ + +* @RenaudWasTaken - Renaud Gaubert <rgau...@NVIDIA.com> + +## Abstract + +This document describes a vendor independant solution to: + * Discovering and representing external devices + * Making these devices available to the container and cleaning them up + afterwards + * Health Check of these devices
noting here, but i would like to also understand how i perform basic node life-cycle operations:
> + 3. A `Monitor` function to notify Kubelet whenever a device becomes + unhealthy. + +The goal is for a user to be able to enable vendor devices (e.g: GPUs) through +the simple following steps: + * `kubectl create -f http://vendor.com/device-plugin-daemonset.yaml` + * When launching `kubectl describe nodes`, the devices appear in the node spec + * In the long term users will be able to select them through Resource Class + +We expect the plugins to be deployed across the clusters through DaemonSets. +The targeted devices are GPUs, NICs, FPGAs, InfiniBand, Storage devices, .... + + +## Motivation + +Kubernetes currently supports discovery of CPU and Memory primarily to a
drop minimal extent.
> + across k8s clusters. + +## Objectives + +1. Add support for vendor specific Devices in kubelet: + * Through a pluggable mechanism. + * Which allows discovery and monitoring of devices. + * Which allows hooking the runtime to make devices available in containers + and cleaning them up. +2. Define a deployment mechanism for this new API. +3. Define a versioning mechanism for this new API. + +## Non Objectives +1. Advanced scheduling and resource selection (solved through [#782](https://github.com/Kubernetes/community/pull/782)). + We will only try to give basic selection primitives to the devices +2. Metrics: this should be the job of cadvisor and should probably either be
i am fine with this as a non-objective, i would avoid stating options for long term homes.
> + rpc Discover(Empty) returns (stream Device) {}
+ rpc Monitor(Empty) returns (stream DeviceHealth) {}
+
+ rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
+ rpc Deallocate(DeallocateRequest) returns (Empty) {}
+}
+
+```
+
+The gRPC server that the device plugin must implement is expected to
+be advertised on a unix socket in a mounted hostPath (e.g:
+`/var/run/Kubernetes/vendor.sock`).
+
+Finally, to notify Kubelet of the existence of the device plugin,
+the vendor's device plugin will have to make a request to Kubelet's
+onwn gRPC server.
nit: own
> +the vendor's device plugin will have to make a request to Kubelet's +onwn gRPC server. +Only then will kubelet start interacting with the vendor's device plugin +through the gRPC apis. + +### End User story + +When setting up the cluster the admin knows what kind of devices are present +on the different machines and therefore can select what devices they want to +enable. + +The cluster admins knows his cluster has NVIDIA GPUs therefore he deploys +the NVIDIA device plugin through: +`kubectl create -f NVIDIA.io/device-plugin.yml` + +The device plugin lands on all the nodes of the cluster and if it detects that
does it terminate? or does it just sit idle?
> +The device plugin lands on all the nodes of the cluster and if it detects that +there are no GPUs it terminates. However, when there are GPUs it reports them +to Kubelet. +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method. + +1. A user submits a pod spec requesting X GPUs (or devices) +2. The scheduler filters the nodes which do not match the resource requests +3. The pod lands on the node and Kubelet decides which device + should be assigned to the pod +4. Kubelet calls `Allocate` on the matching Device Plugins +5. The user deletes the pod or the pod terminates +6. Kubelet calls `Deallocate` on the matching Device Plugins + +When receiving a pod which requests Devices kubelet is in charge of: + * deciding which device to assign to the pod's containers (this will
elaborate on this will change in future in a later section, or remove it.
> +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method. + +1. A user submits a pod spec requesting X GPUs (or devices) +2. The scheduler filters the nodes which do not match the resource requests +3. The pod lands on the node and Kubelet decides which device + should be assigned to the pod +4. Kubelet calls `Allocate` on the matching Device Plugins +5. The user deletes the pod or the pod terminates +6. Kubelet calls `Deallocate` on the matching Device Plugins + +When receiving a pod which requests Devices kubelet is in charge of: + * deciding which device to assign to the pod's containers (this will + change in the future) + * advertising the changes to the node's `Available` list + * advertising the changes to the pods's `Allocated` list
what is this list? is this a field in the API somewhere or just internal state to the kubelet?
> +to Kubelet. +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method. + +1. A user submits a pod spec requesting X GPUs (or devices) +2. The scheduler filters the nodes which do not match the resource requests +3. The pod lands on the node and Kubelet decides which device + should be assigned to the pod +4. Kubelet calls `Allocate` on the matching Device Plugins +5. The user deletes the pod or the pod terminates +6. Kubelet calls `Deallocate` on the matching Device Plugins + +When receiving a pod which requests Devices kubelet is in charge of: + * deciding which device to assign to the pod's containers (this will + change in the future) + * advertising the changes to the node's `Available` list
s/Available/Capacity
node reports: capacity (total), allocatable (total - system-reserved - kube-reserved)
> +1. A user submits a pod spec requesting X GPUs (or devices) +2. The scheduler filters the nodes which do not match the resource requests +3. The pod lands on the node and Kubelet decides which device + should be assigned to the pod +4. Kubelet calls `Allocate` on the matching Device Plugins +5. The user deletes the pod or the pod terminates +6. Kubelet calls `Deallocate` on the matching Device Plugins + +When receiving a pod which requests Devices kubelet is in charge of: + * deciding which device to assign to the pod's containers (this will + change in the future) + * advertising the changes to the node's `Available` list + * advertising the changes to the pods's `Allocated` list + * Calling the `Allocate` function with the list of devices + +The scheduler is still be in charge of filtering the nodes which cannot
nit: the scheduler is in charge of filtering...
> +3. The pod lands on the node and Kubelet decides which device + should be assigned to the pod +4. Kubelet calls `Allocate` on the matching Device Plugins +5. The user deletes the pod or the pod terminates +6. Kubelet calls `Deallocate` on the matching Device Plugins + +When receiving a pod which requests Devices kubelet is in charge of: + * deciding which device to assign to the pod's containers (this will + change in the future) + * advertising the changes to the node's `Available` list + * advertising the changes to the pods's `Allocated` list + * Calling the `Allocate` function with the list of devices + +The scheduler is still be in charge of filtering the nodes which cannot +satisfy the resource requests. +He might in the future be in charge of selecting the device.
remove this for now. let's scope the proposal on what we are doing now, not might do later.
> +## Device Plugin + +### Introduction +The device plugin is structured in 5 parts: +1. Registration: The device plugin advertises it's presence to Kubelet +2. Discovery: Kubelet calls the device plugin to list it's devices +3. Allocate / Deallocate: When creating/deleting containers requesting the + devices advertised by the device plugin, Kubelet calls the device plugin's + `Allocate` and `Deallocate` functions. +4. Cleanup: Kubelet terminates the communication through a "Stop" +4. Heartbeat: The device plugin polls Kubelet to know if it's still alive + and if it has to re-issue a Register request + +### Registration + +When starting the device plugin is expected to make a (client) gRPC call
to be clear, this is done via a .sock exposed by the kubelet which is given to the plugin via a hostPath of some kind?
@derekwaynecarr commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +## Device Plugin + +### Introduction +The device plugin is structured in 5 parts: +1. Registration: The device plugin advertises it's presence to Kubelet +2. Discovery: Kubelet calls the device plugin to list it's devices +3. Allocate / Deallocate: When creating/deleting containers requesting the + devices advertised by the device plugin, Kubelet calls the device plugin's + `Allocate` and `Deallocate` functions. +4. Cleanup: Kubelet terminates the communication through a "Stop" +4. Heartbeat: The device plugin polls Kubelet to know if it's still alive + and if it has to re-issue a Register request + +### Registration + +When starting the device plugin is expected to make a (client) gRPC call
ignore this, i should have read more.
> +4. Cleanup: Kubelet terminates the communication through a "Stop" +4. Heartbeat: The device plugin polls Kubelet to know if it's still alive + and if it has to re-issue a Register request + +### Registration + +When starting the device plugin is expected to make a (client) gRPC call +to the `Register` function that Kubelet exposes. + +The communication between Kubelet is expected to happen only through Unix +sockets and follow this simple pattern: +1. The device plugins starts it's gRPC server +2. The device plugins sends a `RegisterRequest` to Kubelet (through a + gRPC request) +4. Kubelet starts it's Discovery phase and calls `Discover` and `Monitor` +5. Kubelet answers to the `RegisterRequest` with a `RegisterResponse`
why require discover/monitor with register?
> +Kubelet answers with the minimum version it supports and whether or +not there was an error. The errors may include (but not limited to): + * API version not supported + * A device plugin was already registered for this vendor + * A device plugin already registered this device + * Vendor is not consistent across discovered devices + +Kubelet will then interact with the plugin through the following functions: + * `Discover`: List Devices + * `Monitor`: Returns a stream that is written to when a + Device becomes unhealty + * `Allocate`: Called when creating a container with a list of devices + can request changes to the Container config + * `Deallocate`: Called when deleting a container can be used for cleanup + +The device plugin is also expected to periodically call the `Heartbeat` function
the kubelet must require a minimum heartbeat interval to be satisfied by all device plugins.
capturing here, but ignore if its specified elsewhere.
@derekwaynecarr commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +Kubelet answers with the minimum version it supports and whether or +not there was an error. The errors may include (but not limited to): + * API version not supported + * A device plugin was already registered for this vendor + * A device plugin already registered this device + * Vendor is not consistent across discovered devices + +Kubelet will then interact with the plugin through the following functions: + * `Discover`: List Devices + * `Monitor`: Returns a stream that is written to when a + Device becomes unhealty + * `Allocate`: Called when creating a container with a list of devices + can request changes to the Container config + * `Deallocate`: Called when deleting a container can be used for cleanup + +The device plugin is also expected to periodically call the `Heartbeat` function
i imagine we will not record individual device provider heartbeats on node status, but will there be an endpoint i can invoke to poll the kubelet to ask when devices last sent their heartbeat? i would want a mechanism to audit and collect when heartbeats stopped.
> + string Kind = 1;
+ string Name = 2;
+ string Health = 3;
+ string Vendor = 4;
+ map<string, string> properties = 5; // Could be [1, 1.2, 1G]
+}
+
+message DeviceHealth {
+ string Name = 1;
+ string Kind = 2;
+ string Vendor = 4;
+ string Health = 3;
+}
+```
+
+## Installation
and upgrade
> + string Kind = 2; + string Vendor = 4; + string Health = 3; +} +``` + +## Installation + +The installation process should be straightforward to the user, transparent +and similar to other regular Kubernetes actions. +The device plugin should also run in containers so that Kubernetes can +deploy them and restart the plugins when they fail. +However, we should not prevent the user from deploying a bare metal device +plugin. + +Deploying the device plugins through DemonSets makes sense as the cluster
i do think this introduces complexity around node drains. its worth discussing upgrade process.
@RenaudWasTaken commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +3. The pod lands on the node and Kubelet decides which device + should be assigned to the pod +4. Kubelet calls `Allocate` on the matching Device Plugins +5. The user deletes the pod or the pod terminates +6. Kubelet calls `Deallocate` on the matching Device Plugins + +When receiving a pod which requests Devices kubelet is in charge of: + * deciding which device to assign to the pod's containers (this will + change in the future) + * advertising the changes to the node's `Available` list + * advertising the changes to the pods's `Allocated` list + * Calling the `Allocate` function with the list of devices + +The scheduler is still be in charge of filtering the nodes which cannot +satisfy the resource requests. +He might in the future be in charge of selecting the device.
Linking to Resource Class in the next update
> +The device plugin lands on all the nodes of the cluster and if it detects that +there are no GPUs it terminates. However, when there are GPUs it reports them +to Kubelet. +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method. + +1. A user submits a pod spec requesting X GPUs (or devices) +2. The scheduler filters the nodes which do not match the resource requests +3. The pod lands on the node and Kubelet decides which device + should be assigned to the pod +4. Kubelet calls `Allocate` on the matching Device Plugins +5. The user deletes the pod or the pod terminates +6. Kubelet calls `Deallocate` on the matching Device Plugins + +When receiving a pod which requests Devices kubelet is in charge of: + * deciding which device to assign to the pod's containers (this will
> +to Kubelet. +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method. + +1. A user submits a pod spec requesting X GPUs (or devices) +2. The scheduler filters the nodes which do not match the resource requests +3. The pod lands on the node and Kubelet decides which device + should be assigned to the pod +4. Kubelet calls `Allocate` on the matching Device Plugins +5. The user deletes the pod or the pod terminates +6. Kubelet calls `Deallocate` on the matching Device Plugins + +When receiving a pod which requests Devices kubelet is in charge of: + * deciding which device to assign to the pod's containers (this will + change in the future) + * advertising the changes to the node's `Available` list
I agree here, the implementation already matches only this model.
Will update shortly
> +For device plugins reporting non-GPU Devices these are advertised as +OIRs and selected through the same method. + +1. A user submits a pod spec requesting X GPUs (or devices) +2. The scheduler filters the nodes which do not match the resource requests +3. The pod lands on the node and Kubelet decides which device + should be assigned to the pod +4. Kubelet calls `Allocate` on the matching Device Plugins +5. The user deletes the pod or the pod terminates +6. Kubelet calls `Deallocate` on the matching Device Plugins + +When receiving a pod which requests Devices kubelet is in charge of: + * deciding which device to assign to the pod's containers (this will + change in the future) + * advertising the changes to the node's `Available` list + * advertising the changes to the pods's `Allocated` list
There would only be a DevCapacity field in the NodeStatus API
@derekwaynecarr commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +## API Changes
+### Device
+
+When discovering the devices, Kubelet will be in charge of advertising those
+resources to the API server.
+
+We will advertise each device returned by the Device Plugin in a new structure
+called `Device`.
+It is defined as follows:
+
+```golang
+type Device struct {
+ Kind string
+ Vendor string
+ Name string
+ Health DeviceHealthStatus
where is this defined? what happens if this is flapping? do you think this has lastHeartbeatTimes? what i worry about is additional traffic from nodes to masters. right now in large clusters, node to master communication is the dominant traffic to the cluster.
> + hostPath:
+ path: /var/run/kubernetes
+```
+
+## API Changes
+### Device
+
+When discovering the devices, Kubelet will be in charge of advertising those
+resources to the API server.
+
+We will advertise each device returned by the Device Plugin in a new structure
+called `Device`.
+It is defined as follows:
+
+```golang
+type Device struct {
this is a bit like NodeCondition, so a nit would be call this NodeDevice.
@RenaudWasTaken commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +Kubelet answers with the minimum version it supports and whether or +not there was an error. The errors may include (but not limited to): + * API version not supported + * A device plugin was already registered for this vendor + * A device plugin already registered this device + * Vendor is not consistent across discovered devices + +Kubelet will then interact with the plugin through the following functions: + * `Discover`: List Devices + * `Monitor`: Returns a stream that is written to when a + Device becomes unhealty + * `Allocate`: Called when creating a container with a list of devices + can request changes to the Container config + * `Deallocate`: Called when deleting a container can be used for cleanup + +The device plugin is also expected to periodically call the `Heartbeat` function
The heartbeat is more of a mechanism for the Device Plugins to make sure that Kubelet is alive rather than a mechanism for Kubelet to make sure that the Device Plugins are alive
@derekwaynecarr commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +### Device
+
+When discovering the devices, Kubelet will be in charge of advertising those
+resources to the API server.
+
+We will advertise each device returned by the Device Plugin in a new structure
+called `Device`.
+It is defined as follows:
+
+```golang
+type Device struct {
+ Kind string
+ Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
how is this used? can you give some examples for data you would expect here?
> +It is defined as follows:
+
+```golang
+type Device struct {
+ Kind string
+ Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
+}
+```
+
+Because the current API (Capacity) can not be extended to support Device,
+we will need to create two new attributes in the NodeStatus structure:
+ * `DevCapacity`: Describing the device capacity of the node
+ * `DevAvailable`: Describing the available devices
stick with capacity and allocatable terminology
@RenaudWasTaken commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +### Device
+
+When discovering the devices, Kubelet will be in charge of advertising those
+resources to the API server.
+
+We will advertise each device returned by the Device Plugin in a new structure
+called `Device`.
+It is defined as follows:
+
+```golang
+type Device struct {
+ Kind string
+ Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
The structure mirrors the Protobuf so the same example could be used :)
struct Device {
Kind: "NVIDIA-gpu"
Name: "GPU-fef8089b-4820-abfc-e83e-94318197576e"
Properties: {
"Family": "Pascal",
"Memory": "4G",
"ECC" : "True",
}
}
> +## API Changes
+### Device
+
+When discovering the devices, Kubelet will be in charge of advertising those
+resources to the API server.
+
+We will advertise each device returned by the Device Plugin in a new structure
+called `Device`.
+It is defined as follows:
+
+```golang
+type Device struct {
+ Kind string
+ Vendor string
+ Name string
+ Health DeviceHealthStatus
Flapping wouldn't impact the current traffic, because that update would be part of the node_status update.
@derekwaynecarr commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> + Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
+}
+```
+
+Because the current API (Capacity) can not be extended to support Device,
+we will need to create two new attributes in the NodeStatus structure:
+ * `DevCapacity`: Describing the device capacity of the node
+ * `DevAvailable`: Describing the available devices
+
+```golang
+type NodeStatus struct {
+ DevCapacity []Device
+ DevAvailable []Device
i agree with connor.
available is also confusing. i assume you mean allocatable. allocatable is typically fixed based on a reservation of some kind (system-reserved, kube-reserved). this design does not discuss how a device reservation is made (or why?). i am not sure if you are treating available as something different.
> + * `DevAvailable`: Describing the available devices
+
+```golang
+type NodeStatus struct {
+ DevCapacity []Device
+ DevAvailable []Device
+}
+```
+
+We also introduce the `Allocated` field in the pod's status so that user
+can know what devices were assigned to the pod. It could also be useful in
+the case of monitoring
+
+```golang
+type ContainerStatus struct {
+ Devices []Device
i dont think this would need to include health status.
> +### Device
+
+When discovering the devices, Kubelet will be in charge of advertising those
+resources to the API server.
+
+We will advertise each device returned by the Device Plugin in a new structure
+called `Device`.
+It is defined as follows:
+
+```golang
+type Device struct {
+ Kind string
+ Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
nm, see snippet earlier.
@RenaudWasTaken commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +4. Cleanup: Kubelet terminates the communication through a "Stop" +4. Heartbeat: The device plugin polls Kubelet to know if it's still alive + and if it has to re-issue a Register request + +### Registration + +When starting the device plugin is expected to make a (client) gRPC call +to the `Register` function that Kubelet exposes. + +The communication between Kubelet is expected to happen only through Unix +sockets and follow this simple pattern: +1. The device plugins starts it's gRPC server +2. The device plugins sends a `RegisterRequest` to Kubelet (through a + gRPC request) +4. Kubelet starts it's Discovery phase and calls `Discover` and `Monitor` +5. Kubelet answers to the `RegisterRequest` with a `RegisterResponse`
I'm not sure I understand your comment here :)
Discover is called right after registration with the Kubelet to advertise these devices
Monitor is also called right after so that Kubelet can start being notified of any Health changes
@RenaudWasTaken pushed 1 commit.
—
You are receiving this because you are subscribed to this thread.
View it on GitHub or mute the thread.![]()
@RenaudWasTaken pushed 1 commit.
@Hui-Zhi commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +### Device
+
+When discovering the devices, Kubelet will be in charge of advertising those
+resources to the API server.
+
+We will advertise each device returned by the Device Plugin in a new structure
+called `Device`.
+It is defined as follows:
+
+```golang
+type Device struct {
+ Kind string
+ Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
All the devices have something in common, like Family. Should we make it like:
struct Device { Kind: "NVIDIA-gpu" Name: "GPU-fef8089b-4820-abfc-e83e-94318197576e"
Family: "Pascal"
...
Properties: {
"Memory": "4G", "ECC" : "True", } }
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.![]()
@RenaudWasTaken commented on this pull request.
In contributors/design-proposals/device-plugin.md:
> +### Device
+
+When discovering the devices, Kubelet will be in charge of advertising those
+resources to the API server.
+
+We will advertise each device returned by the Device Plugin in a new structure
+called `Device`.
+It is defined as follows:
+
+```golang
+type Device struct {
+ Kind string
+ Vendor string
+ Name string
+ Health DeviceHealthStatus
+ Properties map[string]string
All the GPUs have the attribute Family in common but that wouldn't be the case for solarflare or any other devices