Re: [kubernetes/community] Add Device plugin proposal (#695)

39 views
Skip to first unread message

Saad Ali

unread,
Jul 7, 2017, 6:39:28 PM7/7/17
to kubernetes/community, k8s-mirror-storage-misc, Team mention

To summarize my comments above--making a device (weather GPU or, the case that I am interested in, storage) available to an end user requires:

  1. kubelet discovery and advertising of the device and it's capacity and other parameters.
  2. pod requesting min/max quantity of device resource
  3. scheduler awareness of devices resource
  4. kubelet marking the device resource as no longer available
  5. kubelet making the device available inside pod containers and cleaning up when pod is terminated.

I may be missing a lot of context since I don't attend the resource management meetings, but, so far this proposal does not clarify how each of these steps will happen.

From the storage perspective I want to make sure this aligns with our plans for local storage and CSI. What complicates that is that Storage has it's own API on the k8s side for requesting storage resources, and on the plugin side for provisioning/making them available (CSI).

CC @kubernetes/sig-storage-misc, @msau42


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Renaud Gaubert

unread,
Jul 10, 2017, 4:30:22 PM7/10/17
to kubernetes/community, k8s-mirror-storage-misc, Team mention

Hi @saad-ali !
Thanks for the feedback it looks like my design doc is not clear enough :)

    1. kubelet discovery and advertising of the device and it's capacity and other parameters.
    1. kubelet marking the device resource as no longer available
    2. kubelet making the device available inside pod containers and cleaning up when pod is terminated.

      This design doc is supposed to answer to all the above points and partially the below points :)
      I'll be updating the design doc later this week to try and clarify the points that you mentioned.

      Basically this proposal describes a plugin mechanism for advertising vendor specific resources (and their properties).
      These are advertised through vendor specific "device-plugins" which implements a gRPC protocol allowing:

      • Kubelet to discover the devices through the Discover call
      • Kubelet to make the devices available inside pod containers and cleaning up when pod is terminated through the Allocate and Deallocate calls
      • Kubelet to be notified of the Health status of the devices through the Monitor call
      1. kubelet marking the device resource as no longer available

      Resource management will be handled by Kubelet at the kuberuntime level
      furthermore as the user you will be able to see the devices available in a new field
      added to the nodeStatus struct.

      1. pod requesting min/max quantity of device resource
      2. scheduler awareness of devices resource

      Most of the scheduling work is supposed to be solved by the ResourceClass proposal.
      We were thinking however of using OIR's implementation to advertise the resources
      through something like extensions.kubernetes.io/my-device in the pod spec.
      And then have the ResourceClass replace it.

      Does that answer your questions :D ?

      Renaud Gaubert

      unread,
      Jul 11, 2017, 6:03:47 PM7/11/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention
      • Changed the abstract
      • Reformulated the objectives
      • Added vendor stories
      • Changed a bit the architecture to have a cleaner separation between:
        • Objectives
        • Users and vendor stories (and what they have to do)
        • Device plugin manager design details

      Balaji Subramaniam

      unread,
      Jul 12, 2017, 2:43:07 PM7/12/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @balajismaniam commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +  8. [Device Plugin](#device-plugin)
      +      * [Protocol Overview](#protocol-overview)
      +      * [Protobuf specification](#protobuf-specification)
      +      * [Installation](#installation)
      +      * [API Changes](#api-changes)
      +      * [Versioning](#versioning)
      +
      +_Authors:_
      +
      +* @RenaudWasTaken - Renaud Gaubert <rgau...@nvidia.com>
      +
      +## Abstract
      +
      +This document describes a vendor independant solution to:
      +  * Discovering and representing external devices
      +  * Making available to the container and cleaning up these devices
      

      Nit: rephrase as Making these devices available to the container and cleaning them up afterwards


      In contributors/design-proposals/device-plugin.md:

      > +## Abstract
      +
      +This document describes a vendor independant solution to:
      +  * Discovering and representing external devices
      +  * Making available to the container and cleaning up these devices
      +  * Health Check of these devices
      +
      +Because devices are vendor dependant and have their own sets of problems
      +and mechanisms, the solution we describe is a plugin mechanism managed by
      +Kubelet.
      +
      +At their core, device plugins are simple gRPC servers implementing the
      +gRPC interface defined later in this design document. Once the device
      +plugin makes itself know to kubelet, it will interact with the device
      +plugin through this interface:
      +  1. A `Discover` function for the Kubeket to Discover the devices and
      

      s/Kubeket/kubelet


      In contributors/design-proposals/device-plugin.md:

      > +
      +_GPU Integration Example:_
      +  * [Enable "kick the tires" support for Nvidia GPUs in COS](https://github.com/kubernetes/kubernetes/pull/45136)
      +  * [Extend experimental support to multiple Nvidia GPUs](https://github.com/kubernetes/kubernetes/pull/42116)
      +
      +_Kubernetes Meeting Notes On This:_
      +  * [Meeting notes](https://docs.google.com/document/d/1Qg42Nmv-QwL4RxicsU2qtZgFKOzANf8fGayw8p3lX6U/edit#)
      +  * [Better Abstraction for Compute Resources in Kubernetes](https://docs.google.com/document/d/1666PPUs4Lz56TqKygcy6mXkNazde-vwA7q4e5H92sUc)
      +  * [Extensible support for hardware devices in Kubernetes (join kuberne...@googlegroups.com for access)](https://docs.google.com/document/d/1LHeTPx_fWA1PdZkHuALPzYxR0AYXUiiXdo3S0g2VSlo/edit)
      +
      +## Use Cases
      +
      +  * I want to use a particular device type (GPU, InfiniBand, FPGA, etc.)
      +    in my pod.
      +  * I should be able to use that device without writing custom Kubernetes code.
      +  * I want a consistent and portable solution to consuming hardware devices
      

      Nit: s/consuming/consume


      In contributors/design-proposals/device-plugin.md:

      > +
      +  * I want to use a particular device type (GPU, InfiniBand, FPGA, etc.)
      +    in my pod.
      +  * I should be able to use that device without writing custom Kubernetes code.
      +  * I want a consistent and portable solution to consuming hardware devices
      +    across k8s clusters
      +
      +## Objectives
      +
      +1. Add support for vendor specific Devices in kubelet:
      +    * Through a pluggable mechanism
      +    * Which allows discovery and monitoring of devices
      +    * Which allows hooking the runtime to make devices available in containers
      +      and cleaning them up.
      +2. Define a deployment mechanism for this new API
      +3. Define a versioning mechanism oor this new API
      

      s/oor/for


      In contributors/design-proposals/device-plugin.md:

      > +  * I want to use a particular device type (GPU, InfiniBand, FPGA, etc.)
      +    in my pod.
      +  * I should be able to use that device without writing custom Kubernetes code.
      +  * I want a consistent and portable solution to consuming hardware devices
      +    across k8s clusters
      +
      +## Objectives
      +
      +1. Add support for vendor specific Devices in kubelet:
      +    * Through a pluggable mechanism
      +    * Which allows discovery and monitoring of devices
      +    * Which allows hooking the runtime to make devices available in containers
      +      and cleaning them up.
      +2. Define a deployment mechanism for this new API
      +3. Define a versioning mechanism oor this new API
      +t
      

      Delete this.


      In contributors/design-proposals/device-plugin.md:

      > +  * I want a consistent and portable solution to consuming hardware devices
      +    across k8s clusters
      +
      +## Objectives
      +
      +1. Add support for vendor specific Devices in kubelet:
      +    * Through a pluggable mechanism
      +    * Which allows discovery and monitoring of devices
      +    * Which allows hooking the runtime to make devices available in containers
      +      and cleaning them up.
      +2. Define a deployment mechanism for this new API
      +3. Define a versioning mechanism oor this new API
      +t
      +
      +## Non Objectives
      +1. Advanced scheduling and resource selection (solved through [#782](https://github.com/kubernetes/community/pull/782))
      

      Nit: Missing period at the end of this sentence.


      In contributors/design-proposals/device-plugin.md:

      > +
      +## Objectives
      +
      +1. Add support for vendor specific Devices in kubelet:
      +    * Through a pluggable mechanism
      +    * Which allows discovery and monitoring of devices
      +    * Which allows hooking the runtime to make devices available in containers
      +      and cleaning them up.
      +2. Define a deployment mechanism for this new API
      +3. Define a versioning mechanism oor this new API
      +t
      +
      +## Non Objectives
      +1. Advanced scheduling and resource selection (solved through [#782](https://github.com/kubernetes/community/pull/782))
      +   We will only try to give basic selection primitives to the devices
      +2. Metrics this should be the job of cadvisor and should probably either be
      

      s/Metric this/Metrics: This


      In contributors/design-proposals/device-plugin.md:

      > +
      +Finally, to notify Kubelet of the existence of the device plugin,
      +the vendor's device plugin will have to make a request to Kubelet's
      +onwn gRPC server.
      +Only then will kubelet start interacting with the vendor's device plugin
      +through the gRPC apis.
      +
      +### End User story
      +
      +When setting up the cluster the admin knows what kind of devices are present
      +on the different machines and therefore can select what devices they want to
      +enable.
      +
      +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys
      +the nvidia device plugin through:
      +`kubectl create -f nvidia.io/device-plugin.yml`
      

      Is there recommended way to deploy these DaemonSets (e.g., using node-feature-discovery)?


      In contributors/design-proposals/device-plugin.md:

      > +
      +### End User story
      +
      +When setting up the cluster the admin knows what kind of devices are present
      +on the different machines and therefore can select what devices they want to
      +enable.
      +
      +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys
      +the nvidia device plugin through:
      +`kubectl create -f nvidia.io/device-plugin.yml`
      +
      +The device plugin lands on all the nodes of the cluster and if it detects that
      +there are no GPUs it terminates. However, when there are GPUs it reports them
      +to Kubelet.
      +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      

      Can non-gpu devices be advertised using the above DaemonSet mechanism? Why should OIRs be used? Is this done to maintain backward compatibility?


      In contributors/design-proposals/device-plugin.md:

      > +	string Kind = 2;
      +	string Vendor = 4;
      +	string Health = 3;
      +}
      +```
      +
      +## Installation
      +
      +The installation process should be straightforward to the user, transparent
      +and similar to other regular Kubernetes actions.
      +The device plugin should also run in containers so that kubernetes can
      +deploy them and restart the plugins when they fail.
      +However, we should not prevent the user from deploying a bare metal device
      +plugin.
      +
      +Deploying the device plugins though DemonSets makes sense as the cluster
      

      s/though/through


      In contributors/design-proposals/device-plugin.md:

      > +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      +}
      +```
      +
      +Because the current API (Capacity) can not be extended to support Device,
      +we will need to create two new attributes in the NodeStatus structure:
      +  * `DevCapacity`: Describing the device capacity of the node
      +  * `DevAvailable`: Describing the available devices
      +
      +```golang
      +type NodeStatus struct {
      +	DevCapacity []Device
      +	DevAvailable []Device
      

      Having Available and Allocatable is confusing. Is a mechanism similar to OIR to advertise Capacity and Allocatable considered already? May be relaxing the OIR prefix/namespace requirements? (See: https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/helper/helpers.go#L34)

      Renaud Gaubert

      unread,
      Jul 12, 2017, 4:56:44 PM7/12/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +
      +Finally, to notify Kubelet of the existence of the device plugin,
      +the vendor's device plugin will have to make a request to Kubelet's
      +onwn gRPC server.
      +Only then will kubelet start interacting with the vendor's device plugin
      +through the gRPC apis.
      +
      +### End User story
      +
      +When setting up the cluster the admin knows what kind of devices are present
      +on the different machines and therefore can select what devices they want to
      +enable.
      +
      +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys
      +the nvidia device plugin through:
      +`kubectl create -f nvidia.io/device-plugin.yml`
      

      Can you develop your question ? What do you mean by recommended way to deploy a DaemonSet ?
      The DaemonSet is the deployment mechanism :)

      Renaud Gaubert

      unread,
      Jul 12, 2017, 4:58:44 PM7/12/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      +}
      +```
      +
      +Because the current API (Capacity) can not be extended to support Device,
      +we will need to create two new attributes in the NodeStatus structure:
      +  * `DevCapacity`: Describing the device capacity of the node
      +  * `DevAvailable`: Describing the available devices
      +
      +```golang
      +type NodeStatus struct {
      +	DevCapacity []Device
      +	DevAvailable []Device
      

      As mentioned on the previous comment Allocatable != Available: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md

      DevAvailable should not be named DevAllocatable because it does not do the same thing

      Renaud Gaubert

      unread,
      Jul 12, 2017, 5:05:30 PM7/12/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +
      +### End User story
      +
      +When setting up the cluster the admin knows what kind of devices are present
      +on the different machines and therefore can select what devices they want to
      +enable.
      +
      +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys
      +the nvidia device plugin through:
      +`kubectl create -f nvidia.io/device-plugin.yml`
      +
      +The device plugin lands on all the nodes of the cluster and if it detects that
      +there are no GPUs it terminates. However, when there are GPUs it reports them
      +to Kubelet.
      +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      

      Can non-gpu devices be advertised using the above DaemonSet mechanism?

      I don't understand your question about DaemonSet ? DaemonSet is just a way to deploy a Device Plugin. The plugin then advertise the resource to Kubelet.

      Currently the idea is to have GPUs advertised by the Nvidia device plugin to be advertised by the node as "alpha.kubernetes.io/nvidia-gpu" and all other devices as "pod.alpha.kubernetes.io/opaque-int-resource-my-device".

      We are thinking of actually changing the name to "extensions.kubernetes.io/my-device" but this is really another discussion :)

      Renaud Gaubert

      unread,
      Jul 12, 2017, 5:05:45 PM7/12/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      Thanks @balajismaniam for noticing all these errors :)

      Vikas Choudhary

      unread,
      Jul 12, 2017, 11:56:47 PM7/12/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @vikaschoudhary16 commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +
      +Finally, to notify Kubelet of the existence of the device plugin,
      +the vendor's device plugin will have to make a request to Kubelet's
      +onwn gRPC server.
      +Only then will kubelet start interacting with the vendor's device plugin
      +through the gRPC apis.
      +
      +### End User story
      +
      +When setting up the cluster the admin knows what kind of devices are present
      +on the different machines and therefore can select what devices they want to
      +enable.
      +
      +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys
      +the nvidia device plugin through:
      +`kubectl create -f nvidia.io/device-plugin.yml`
      

      @balajismaniam
      IIUC, user is expected to deploy daemonset manually and once deployed, vendor device plugin will do the discovery. Deploying daemonset is the 0th step. @RenaudWasTaken can correct me, if i said wrong.

      Vikas Choudhary

      unread,
      Jul 13, 2017, 12:01:06 AM7/13/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @vikaschoudhary16 commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +
      +### End User story
      +
      +When setting up the cluster the admin knows what kind of devices are present
      +on the different machines and therefore can select what devices they want to
      +enable.
      +
      +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys
      +the nvidia device plugin through:
      +`kubectl create -f nvidia.io/device-plugin.yml`
      +
      +The device plugin lands on all the nodes of the cluster and if it detects that
      +there are no GPUs it terminates. However, when there are GPUs it reports them
      +to Kubelet.
      +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      

      @balajismaniam

      Why should OIRs be used?

      OIRs are being used to get the scheduling done. In the future, when resource classes are available and are mature enough, OIRs will be deprecated.

      Balaji Subramaniam

      unread,
      Jul 13, 2017, 1:15:28 PM7/13/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @balajismaniam commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +
      +### End User story
      +
      +When setting up the cluster the admin knows what kind of devices are present
      +on the different machines and therefore can select what devices they want to
      +enable.
      +
      +The cluster admins knows his cluster has Nvidia GPUs therefore he deploys
      +the nvidia device plugin through:
      +`kubectl create -f nvidia.io/device-plugin.yml`
      +
      +The device plugin lands on all the nodes of the cluster and if it detects that
      +there are no GPUs it terminates. However, when there are GPUs it reports them
      +to Kubelet.
      +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      

      Thanks for the clarification. Making this (i.e., OIRs will be deprecated) clear in the proposal might be good.

      Balaji Subramaniam

      unread,
      Jul 13, 2017, 1:21:16 PM7/13/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @balajismaniam commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      +}
      +```
      +
      +Because the current API (Capacity) can not be extended to support Device,
      +we will need to create two new attributes in the NodeStatus structure:
      +  * `DevCapacity`: Describing the device capacity of the node
      +  * `DevAvailable`: Describing the available devices
      +
      +```golang
      +type NodeStatus struct {
      +	DevCapacity []Device
      +	DevAvailable []Device
      

      I understand the difference between Allocatable and Available. I was alluding to confusion caused by introducing Available itself.

      Connor Doyle

      unread,
      Jul 14, 2017, 11:58:31 AM7/14/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @ConnorDoyle commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      +}
      +```
      +
      +Because the current API (Capacity) can not be extended to support Device,
      +we will need to create two new attributes in the NodeStatus structure:
      +  * `DevCapacity`: Describing the device capacity of the node
      +  * `DevAvailable`: Describing the available devices
      +
      +```golang
      +type NodeStatus struct {
      +	DevCapacity []Device
      +	DevAvailable []Device
      

      FWIW, I think finding a way to do without API changes to the node spec would help sharpen this proposal to focus on the device plugin system. Of course there needs to be a way to schedule against the device resources, but could something simpler fork for the first alpha?

      Derek Carr

      unread,
      Jul 20, 2017, 5:45:07 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +
      +This document describes a vendor independant solution to:
      +  * Discovering and representing external devices
      +  * Making these devices available to the container and cleaning them up
      +    afterwards
      +  * Health Check of these devices
      +
      +Because devices are vendor dependant and have their own sets of problems
      +and mechanisms, the solution we describe is a plugin mechanism managed by
      +Kubelet.
      +
      +At their core, device plugins are simple gRPC servers that may run in a
      +container deployed through the pod mechanism.
      +
      +These servers implement the gRPC interface defined later in this design
      +document and once the device plugin makes itself know to kubelet, kubelet
      

      nit: known to kubelet,

      Derek Carr

      unread,
      Jul 20, 2017, 5:47:39 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +      * [Protobuf specification](#protobuf-specification)
      +      * [Installation](#installation)
      +      * [API Changes](#api-changes)
      +      * [Versioning](#versioning)
      +
      +_Authors:_
      +
      +* @RenaudWasTaken - Renaud Gaubert <rgau...@NVIDIA.com>
      +
      +## Abstract
      +
      +This document describes a vendor independant solution to:
      +  * Discovering and representing external devices
      +  * Making these devices available to the container and cleaning them up
      +    afterwards
      +  * Health Check of these devices
      

      noting here, but i would like to also understand how i perform basic node life-cycle operations:

      1. drain of a node that was using these plugins (does ordering matter)
      2. upgrade of host, say a security vulnerability (i.e. yum update, etc.)
      3. upgrade to new kubectl versions (what concerns, if any do i have with skew)

      Derek Carr

      unread,
      Jul 20, 2017, 5:49:53 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +  3. A `Monitor` function to notify Kubelet whenever a device becomes
      +     unhealthy.
      +
      +The goal is for a user to be able to enable vendor devices (e.g: GPUs) through
      +the simple following steps:
      +  * `kubectl create -f http://vendor.com/device-plugin-daemonset.yaml`
      +  * When launching `kubectl describe nodes`, the devices appear in the node spec
      +  * In the long term users will be able to select them through Resource Class
      +
      +We expect the plugins to be deployed across the clusters through DaemonSets.
      +The targeted devices are GPUs, NICs, FPGAs, InfiniBand, Storage devices, ....
      +
      +
      +## Motivation
      +
      +Kubernetes currently supports discovery of CPU and Memory primarily to a
      

      drop minimal extent.

      Derek Carr

      unread,
      Jul 20, 2017, 5:56:17 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +    across k8s clusters.
      +
      +## Objectives
      +
      +1. Add support for vendor specific Devices in kubelet:
      +    * Through a pluggable mechanism.
      +    * Which allows discovery and monitoring of devices.
      +    * Which allows hooking the runtime to make devices available in containers
      +      and cleaning them up.
      +2. Define a deployment mechanism for this new API.
      +3. Define a versioning mechanism for this new API.
      +
      +## Non Objectives
      +1. Advanced scheduling and resource selection (solved through [#782](https://github.com/Kubernetes/community/pull/782)).
      +   We will only try to give basic selection primitives to the devices
      +2. Metrics: this should be the job of cadvisor and should probably either be
      

      i am fine with this as a non-objective, i would avoid stating options for long term homes.

      Derek Carr

      unread,
      Jul 20, 2017, 5:57:08 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +	rpc Discover(Empty) returns (stream Device) {}
      +	rpc Monitor(Empty) returns (stream DeviceHealth) {}
      +
      +	rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
      +	rpc Deallocate(DeallocateRequest) returns (Empty) {}
      +}
      +
      +```
      +
      +The gRPC server that the device plugin must implement is expected to
      +be advertised on a unix socket in a mounted hostPath (e.g:
      +`/var/run/Kubernetes/vendor.sock`).
      +
      +Finally, to notify Kubelet of the existence of the device plugin,
      +the vendor's device plugin will have to make a request to Kubelet's
      +onwn gRPC server.
      

      nit: own

      Derek Carr

      unread,
      Jul 20, 2017, 5:58:19 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +the vendor's device plugin will have to make a request to Kubelet's
      +onwn gRPC server.
      +Only then will kubelet start interacting with the vendor's device plugin
      +through the gRPC apis.
      +
      +### End User story
      +
      +When setting up the cluster the admin knows what kind of devices are present
      +on the different machines and therefore can select what devices they want to
      +enable.
      +
      +The cluster admins knows his cluster has NVIDIA GPUs therefore he deploys
      +the NVIDIA device plugin through:
      +`kubectl create -f NVIDIA.io/device-plugin.yml`
      +
      +The device plugin lands on all the nodes of the cluster and if it detects that
      

      does it terminate? or does it just sit idle?

      Derek Carr

      unread,
      Jul 20, 2017, 6:01:14 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +The device plugin lands on all the nodes of the cluster and if it detects that
      +there are no GPUs it terminates. However, when there are GPUs it reports them
      +to Kubelet.
      +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      +
      +1. A user submits a pod spec requesting X GPUs (or devices)
      +2. The scheduler filters the nodes which do not match the resource requests
      +3. The pod lands on the node and Kubelet decides which device
      +   should be assigned to the pod
      +4. Kubelet calls `Allocate` on the matching Device Plugins
      +5. The user deletes the pod or the pod terminates
      +6. Kubelet calls `Deallocate` on the matching Device Plugins
      +
      +When receiving a pod which requests Devices kubelet is in charge of:
      +  * deciding which device to assign to the pod's containers (this will
      

      elaborate on this will change in future in a later section, or remove it.

      Derek Carr

      unread,
      Jul 20, 2017, 6:04:11 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      +
      +1. A user submits a pod spec requesting X GPUs (or devices)
      +2. The scheduler filters the nodes which do not match the resource requests
      +3. The pod lands on the node and Kubelet decides which device
      +   should be assigned to the pod
      +4. Kubelet calls `Allocate` on the matching Device Plugins
      +5. The user deletes the pod or the pod terminates
      +6. Kubelet calls `Deallocate` on the matching Device Plugins
      +
      +When receiving a pod which requests Devices kubelet is in charge of:
      +  * deciding which device to assign to the pod's containers (this will
      +    change in the future)
      +  * advertising the changes to the node's `Available` list
      +  * advertising the changes to the pods's `Allocated` list
      

      what is this list? is this a field in the API somewhere or just internal state to the kubelet?

      Derek Carr

      unread,
      Jul 20, 2017, 6:04:29 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +to Kubelet.
      +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      +
      +1. A user submits a pod spec requesting X GPUs (or devices)
      +2. The scheduler filters the nodes which do not match the resource requests
      +3. The pod lands on the node and Kubelet decides which device
      +   should be assigned to the pod
      +4. Kubelet calls `Allocate` on the matching Device Plugins
      +5. The user deletes the pod or the pod terminates
      +6. Kubelet calls `Deallocate` on the matching Device Plugins
      +
      +When receiving a pod which requests Devices kubelet is in charge of:
      +  * deciding which device to assign to the pod's containers (this will
      +    change in the future)
      +  * advertising the changes to the node's `Available` list
      

      s/Available/Capacity

      node reports: capacity (total), allocatable (total - system-reserved - kube-reserved)

      Derek Carr

      unread,
      Jul 20, 2017, 6:04:40 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +1. A user submits a pod spec requesting X GPUs (or devices)
      +2. The scheduler filters the nodes which do not match the resource requests
      +3. The pod lands on the node and Kubelet decides which device
      +   should be assigned to the pod
      +4. Kubelet calls `Allocate` on the matching Device Plugins
      +5. The user deletes the pod or the pod terminates
      +6. Kubelet calls `Deallocate` on the matching Device Plugins
      +
      +When receiving a pod which requests Devices kubelet is in charge of:
      +  * deciding which device to assign to the pod's containers (this will
      +    change in the future)
      +  * advertising the changes to the node's `Available` list
      +  * advertising the changes to the pods's `Allocated` list
      +  * Calling the `Allocate` function with the list of devices
      +
      +The scheduler is still be in charge of filtering the nodes which cannot
      

      nit: the scheduler is in charge of filtering...

      Derek Carr

      unread,
      Jul 20, 2017, 6:04:59 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +3. The pod lands on the node and Kubelet decides which device
      +   should be assigned to the pod
      +4. Kubelet calls `Allocate` on the matching Device Plugins
      +5. The user deletes the pod or the pod terminates
      +6. Kubelet calls `Deallocate` on the matching Device Plugins
      +
      +When receiving a pod which requests Devices kubelet is in charge of:
      +  * deciding which device to assign to the pod's containers (this will
      +    change in the future)
      +  * advertising the changes to the node's `Available` list
      +  * advertising the changes to the pods's `Allocated` list
      +  * Calling the `Allocate` function with the list of devices
      +
      +The scheduler is still be in charge of filtering the nodes which cannot
      +satisfy the resource requests.
      +He might in the future be in charge of selecting the device.
      

      remove this for now. let's scope the proposal on what we are doing now, not might do later.

      Derek Carr

      unread,
      Jul 20, 2017, 6:06:17 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +## Device Plugin
      +
      +### Introduction
      +The device plugin is structured in 5 parts:
      +1. Registration: The device plugin advertises it's presence to Kubelet
      +2. Discovery: Kubelet calls the device plugin to list it's devices
      +3. Allocate / Deallocate: When creating/deleting containers requesting the
      +   devices advertised by the device plugin, Kubelet calls the device plugin's
      +   `Allocate` and `Deallocate` functions.
      +4. Cleanup: Kubelet terminates the communication through a "Stop"
      +4. Heartbeat: The device plugin polls Kubelet to know if it's still alive
      +   and if it has to re-issue a Register request
      +
      +### Registration
      +
      +When starting the device plugin is expected to make a (client) gRPC call
      

      to be clear, this is done via a .sock exposed by the kubelet which is given to the plugin via a hostPath of some kind?

      Derek Carr

      unread,
      Jul 20, 2017, 6:06:28 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +## Device Plugin
      +
      +### Introduction
      +The device plugin is structured in 5 parts:
      +1. Registration: The device plugin advertises it's presence to Kubelet
      +2. Discovery: Kubelet calls the device plugin to list it's devices
      +3. Allocate / Deallocate: When creating/deleting containers requesting the
      +   devices advertised by the device plugin, Kubelet calls the device plugin's
      +   `Allocate` and `Deallocate` functions.
      +4. Cleanup: Kubelet terminates the communication through a "Stop"
      +4. Heartbeat: The device plugin polls Kubelet to know if it's still alive
      +   and if it has to re-issue a Register request
      +
      +### Registration
      +
      +When starting the device plugin is expected to make a (client) gRPC call
      

      ignore this, i should have read more.

      Derek Carr

      unread,
      Jul 20, 2017, 6:07:50 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +4. Cleanup: Kubelet terminates the communication through a "Stop"
      +4. Heartbeat: The device plugin polls Kubelet to know if it's still alive
      +   and if it has to re-issue a Register request
      +
      +### Registration
      +
      +When starting the device plugin is expected to make a (client) gRPC call
      +to the `Register` function that Kubelet exposes.
      +
      +The communication between Kubelet is expected to happen only through Unix
      +sockets and follow this simple pattern:
      +1. The device plugins starts it's gRPC server
      +2. The device plugins sends a `RegisterRequest` to Kubelet (through a
      +   gRPC request)
      +4. Kubelet starts it's Discovery phase and calls `Discover` and `Monitor`
      +5. Kubelet answers to the `RegisterRequest` with a `RegisterResponse`
      

      why require discover/monitor with register?

      Derek Carr

      unread,
      Jul 20, 2017, 6:10:16 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +Kubelet answers with the minimum version it supports and whether or
      +not there was an error. The errors may include (but not limited to):
      +  * API version not supported
      +  * A device plugin was already registered for this vendor
      +  * A device plugin already registered this device
      +  * Vendor is not consistent across discovered devices
      +
      +Kubelet will then interact with the plugin through the following functions:
      +  * `Discover`: List Devices
      +  * `Monitor`: Returns a stream that is written to when a
      +     Device becomes unhealty
      +  * `Allocate`: Called when creating a container with a list of devices
      +     can request changes to the Container config
      +  * `Deallocate`: Called when deleting a container can be used for cleanup
      +
      +The device plugin is also expected to periodically call the `Heartbeat` function
      

      the kubelet must require a minimum heartbeat interval to be satisfied by all device plugins.

      capturing here, but ignore if its specified elsewhere.

      Derek Carr

      unread,
      Jul 20, 2017, 6:11:59 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +Kubelet answers with the minimum version it supports and whether or
      +not there was an error. The errors may include (but not limited to):
      +  * API version not supported
      +  * A device plugin was already registered for this vendor
      +  * A device plugin already registered this device
      +  * Vendor is not consistent across discovered devices
      +
      +Kubelet will then interact with the plugin through the following functions:
      +  * `Discover`: List Devices
      +  * `Monitor`: Returns a stream that is written to when a
      +     Device becomes unhealty
      +  * `Allocate`: Called when creating a container with a list of devices
      +     can request changes to the Container config
      +  * `Deallocate`: Called when deleting a container can be used for cleanup
      +
      +The device plugin is also expected to periodically call the `Heartbeat` function
      

      i imagine we will not record individual device provider heartbeats on node status, but will there be an endpoint i can invoke to poll the kubelet to ask when devices last sent their heartbeat? i would want a mechanism to audit and collect when heartbeats stopped.

      Derek Carr

      unread,
      Jul 20, 2017, 6:12:38 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +	string Kind = 1;
      +	string Name = 2;
      +	string Health = 3;
      +	string Vendor = 4;
      +	map<string, string> properties = 5; // Could be [1, 1.2, 1G]
      +}
      +
      +message DeviceHealth {
      +	string Name = 1;
      +	string Kind = 2;
      +	string Vendor = 4;
      +	string Health = 3;
      +}
      +```
      +
      +## Installation
      

      and upgrade

      Derek Carr

      unread,
      Jul 20, 2017, 6:13:39 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +	string Kind = 2;
      +	string Vendor = 4;
      +	string Health = 3;
      +}
      +```
      +
      +## Installation
      +
      +The installation process should be straightforward to the user, transparent
      +and similar to other regular Kubernetes actions.
      +The device plugin should also run in containers so that Kubernetes can
      +deploy them and restart the plugins when they fail.
      +However, we should not prevent the user from deploying a bare metal device
      +plugin.
      +
      +Deploying the device plugins through DemonSets makes sense as the cluster
      

      i do think this introduces complexity around node drains. its worth discussing upgrade process.

      Renaud Gaubert

      unread,
      Jul 20, 2017, 6:14:47 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +3. The pod lands on the node and Kubelet decides which device
      +   should be assigned to the pod
      +4. Kubelet calls `Allocate` on the matching Device Plugins
      +5. The user deletes the pod or the pod terminates
      +6. Kubelet calls `Deallocate` on the matching Device Plugins
      +
      +When receiving a pod which requests Devices kubelet is in charge of:
      +  * deciding which device to assign to the pod's containers (this will
      +    change in the future)
      +  * advertising the changes to the node's `Available` list
      +  * advertising the changes to the pods's `Allocated` list
      +  * Calling the `Allocate` function with the list of devices
      +
      +The scheduler is still be in charge of filtering the nodes which cannot
      +satisfy the resource requests.
      +He might in the future be in charge of selecting the device.
      

      Linking to Resource Class in the next update

      Renaud Gaubert

      unread,
      Jul 20, 2017, 6:15:25 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +The device plugin lands on all the nodes of the cluster and if it detects that
      +there are no GPUs it terminates. However, when there are GPUs it reports them
      +to Kubelet.
      +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      +
      +1. A user submits a pod spec requesting X GPUs (or devices)
      +2. The scheduler filters the nodes which do not match the resource requests
      +3. The pod lands on the node and Kubelet decides which device
      +   should be assigned to the pod
      +4. Kubelet calls `Allocate` on the matching Device Plugins
      +5. The user deletes the pod or the pod terminates
      +6. Kubelet calls `Deallocate` on the matching Device Plugins
      +
      +When receiving a pod which requests Devices kubelet is in charge of:
      +  * deciding which device to assign to the pod's containers (this will
      

      Renaud Gaubert

      unread,
      Jul 20, 2017, 6:16:35 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +to Kubelet.
      +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      +
      +1. A user submits a pod spec requesting X GPUs (or devices)
      +2. The scheduler filters the nodes which do not match the resource requests
      +3. The pod lands on the node and Kubelet decides which device
      +   should be assigned to the pod
      +4. Kubelet calls `Allocate` on the matching Device Plugins
      +5. The user deletes the pod or the pod terminates
      +6. Kubelet calls `Deallocate` on the matching Device Plugins
      +
      +When receiving a pod which requests Devices kubelet is in charge of:
      +  * deciding which device to assign to the pod's containers (this will
      +    change in the future)
      +  * advertising the changes to the node's `Available` list
      

      I agree here, the implementation already matches only this model.
      Will update shortly

      Renaud Gaubert

      unread,
      Jul 20, 2017, 6:17:50 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +For device plugins reporting non-GPU Devices these are advertised as
      +OIRs and selected through the same method.
      +
      +1. A user submits a pod spec requesting X GPUs (or devices)
      +2. The scheduler filters the nodes which do not match the resource requests
      +3. The pod lands on the node and Kubelet decides which device
      +   should be assigned to the pod
      +4. Kubelet calls `Allocate` on the matching Device Plugins
      +5. The user deletes the pod or the pod terminates
      +6. Kubelet calls `Deallocate` on the matching Device Plugins
      +
      +When receiving a pod which requests Devices kubelet is in charge of:
      +  * deciding which device to assign to the pod's containers (this will
      +    change in the future)
      +  * advertising the changes to the node's `Available` list
      +  * advertising the changes to the pods's `Allocated` list
      

      There would only be a DevCapacity field in the NodeStatus API

      Derek Carr

      unread,
      Jul 20, 2017, 6:18:35 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +## API Changes
      +### Device
      +
      +When discovering the devices, Kubelet will be in charge of advertising those
      +resources to the API server.
      +
      +We will advertise each device returned by the Device Plugin in a new structure
      +called `Device`.
      +It is defined as follows:
      +
      +```golang
      +type Device struct {
      +	Kind       string
      +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      

      where is this defined? what happens if this is flapping? do you think this has lastHeartbeatTimes? what i worry about is additional traffic from nodes to masters. right now in large clusters, node to master communication is the dominant traffic to the cluster.

      Derek Carr

      unread,
      Jul 20, 2017, 6:18:45 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +               hostPath:
      +                   path: /var/run/kubernetes
      +```
      +
      +## API Changes
      +### Device
      +
      +When discovering the devices, Kubelet will be in charge of advertising those
      +resources to the API server.
      +
      +We will advertise each device returned by the Device Plugin in a new structure
      +called `Device`.
      +It is defined as follows:
      +
      +```golang
      +type Device struct {
      

      this is a bit like NodeCondition, so a nit would be call this NodeDevice.

      Renaud Gaubert

      unread,
      Jul 20, 2017, 6:19:29 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +Kubelet answers with the minimum version it supports and whether or
      +not there was an error. The errors may include (but not limited to):
      +  * API version not supported
      +  * A device plugin was already registered for this vendor
      +  * A device plugin already registered this device
      +  * Vendor is not consistent across discovered devices
      +
      +Kubelet will then interact with the plugin through the following functions:
      +  * `Discover`: List Devices
      +  * `Monitor`: Returns a stream that is written to when a
      +     Device becomes unhealty
      +  * `Allocate`: Called when creating a container with a list of devices
      +     can request changes to the Container config
      +  * `Deallocate`: Called when deleting a container can be used for cleanup
      +
      +The device plugin is also expected to periodically call the `Heartbeat` function
      

      The heartbeat is more of a mechanism for the Device Plugins to make sure that Kubelet is alive rather than a mechanism for Kubelet to make sure that the Device Plugins are alive

      Derek Carr

      unread,
      Jul 20, 2017, 6:19:46 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +### Device
      +
      +When discovering the devices, Kubelet will be in charge of advertising those
      +resources to the API server.
      +
      +We will advertise each device returned by the Device Plugin in a new structure
      +called `Device`.
      +It is defined as follows:
      +
      +```golang
      +type Device struct {
      +	Kind       string
      +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      

      how is this used? can you give some examples for data you would expect here?

      Derek Carr

      unread,
      Jul 20, 2017, 6:20:00 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +It is defined as follows:
      +
      +```golang
      +type Device struct {
      +	Kind       string
      +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      +}
      +```
      +
      +Because the current API (Capacity) can not be extended to support Device,
      +we will need to create two new attributes in the NodeStatus structure:
      +  * `DevCapacity`: Describing the device capacity of the node
      +  * `DevAvailable`: Describing the available devices
      

      stick with capacity and allocatable terminology

      Renaud Gaubert

      unread,
      Jul 20, 2017, 6:21:25 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +### Device
      +
      +When discovering the devices, Kubelet will be in charge of advertising those
      +resources to the API server.
      +
      +We will advertise each device returned by the Device Plugin in a new structure
      +called `Device`.
      +It is defined as follows:
      +
      +```golang
      +type Device struct {
      +	Kind       string
      +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      

      The structure mirrors the Protobuf so the same example could be used :)
      struct Device {
      Kind: "NVIDIA-gpu"
      Name: "GPU-fef8089b-4820-abfc-e83e-94318197576e"
      Properties: {
      "Family": "Pascal",
      "Memory": "4G",
      "ECC" : "True",
      }
      }

      Renaud Gaubert

      unread,
      Jul 20, 2017, 6:22:56 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +## API Changes
      +### Device
      +
      +When discovering the devices, Kubelet will be in charge of advertising those
      +resources to the API server.
      +
      +We will advertise each device returned by the Device Plugin in a new structure
      +called `Device`.
      +It is defined as follows:
      +
      +```golang
      +type Device struct {
      +	Kind       string
      +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      

      Flapping wouldn't impact the current traffic, because that update would be part of the node_status update.

      Derek Carr

      unread,
      Jul 20, 2017, 6:23:40 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      +}
      +```
      +
      +Because the current API (Capacity) can not be extended to support Device,
      +we will need to create two new attributes in the NodeStatus structure:
      +  * `DevCapacity`: Describing the device capacity of the node
      +  * `DevAvailable`: Describing the available devices
      +
      +```golang
      +type NodeStatus struct {
      +	DevCapacity []Device
      +	DevAvailable []Device
      

      i agree with connor.

      available is also confusing. i assume you mean allocatable. allocatable is typically fixed based on a reservation of some kind (system-reserved, kube-reserved). this design does not discuss how a device reservation is made (or why?). i am not sure if you are treating available as something different.

      Derek Carr

      unread,
      Jul 20, 2017, 6:24:43 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +  * `DevAvailable`: Describing the available devices
      +
      +```golang
      +type NodeStatus struct {
      +	DevCapacity []Device
      +	DevAvailable []Device
      +}
      +```
      +
      +We also introduce the `Allocated` field in the pod's status so that user
      +can know what devices were assigned to the pod. It could also be useful in
      +the case of monitoring
      +
      +```golang
      +type ContainerStatus struct {
      +	Devices []Device
      

      i dont think this would need to include health status.

      Derek Carr

      unread,
      Jul 20, 2017, 6:38:17 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @derekwaynecarr commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +### Device
      +
      +When discovering the devices, Kubelet will be in charge of advertising those
      +resources to the API server.
      +
      +We will advertise each device returned by the Device Plugin in a new structure
      +called `Device`.
      +It is defined as follows:
      +
      +```golang
      +type Device struct {
      +	Kind       string
      +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      

      nm, see snippet earlier.

      Renaud Gaubert

      unread,
      Jul 20, 2017, 7:04:03 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +4. Cleanup: Kubelet terminates the communication through a "Stop"
      +4. Heartbeat: The device plugin polls Kubelet to know if it's still alive
      +   and if it has to re-issue a Register request
      +
      +### Registration
      +
      +When starting the device plugin is expected to make a (client) gRPC call
      +to the `Register` function that Kubelet exposes.
      +
      +The communication between Kubelet is expected to happen only through Unix
      +sockets and follow this simple pattern:
      +1. The device plugins starts it's gRPC server
      +2. The device plugins sends a `RegisterRequest` to Kubelet (through a
      +   gRPC request)
      +4. Kubelet starts it's Discovery phase and calls `Discover` and `Monitor`
      +5. Kubelet answers to the `RegisterRequest` with a `RegisterResponse`
      

      I'm not sure I understand your comment here :)
      Discover is called right after registration with the Kubelet to advertise these devices
      Monitor is also called right after so that Kubelet can start being notified of any Health changes

      Renaud Gaubert

      unread,
      Jul 20, 2017, 7:25:51 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Push

      @RenaudWasTaken pushed 1 commit.

      • 4cbc77e Device plugin proposal patch by Jiaying


      You are receiving this because you are subscribed to this thread.
      View it on GitHub or mute the thread.

      Renaud Gaubert

      unread,
      Jul 20, 2017, 8:51:09 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Push

      @RenaudWasTaken pushed 1 commit.

      • 4681e71 Addressed comments, added protocol overview, explained impl differences

      Hui-Zhi

      unread,
      Jul 20, 2017, 11:11:50 PM7/20/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @Hui-Zhi commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +### Device
      +
      +When discovering the devices, Kubelet will be in charge of advertising those
      +resources to the API server.
      +
      +We will advertise each device returned by the Device Plugin in a new structure
      +called `Device`.
      +It is defined as follows:
      +
      +```golang
      +type Device struct {
      +	Kind       string
      +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      

      All the devices have something in common, like Family. Should we make it like:

      struct Device {
          Kind: "NVIDIA-gpu"
          Name: "GPU-fef8089b-4820-abfc-e83e-94318197576e"
          Family: "Pascal"
          ...
          Properties: {
               
      "Memory": "4G",
               "ECC"   : "True",
           }
       }


      You are receiving this because you are on a team that was mentioned.

      Reply to this email directly, view it on GitHub, or mute the thread.

      Renaud Gaubert

      unread,
      Jul 21, 2017, 1:58:26 AM7/21/17
      to kubernetes/community, k8s-mirror-storage-misc, Team mention

      @RenaudWasTaken commented on this pull request.


      In contributors/design-proposals/device-plugin.md:

      > +### Device
      +
      +When discovering the devices, Kubelet will be in charge of advertising those
      +resources to the API server.
      +
      +We will advertise each device returned by the Device Plugin in a new structure
      +called `Device`.
      +It is defined as follows:
      +
      +```golang
      +type Device struct {
      +	Kind       string
      +	Vendor     string
      +	Name       string
      +	Health     DeviceHealthStatus
      +	Properties map[string]string
      

      All the GPUs have the attribute Family in common but that wouldn't be the case for solarflare or any other devices

      Reply all
      Reply to author
      Forward
      0 new messages