New project for Kubernetes org: “node-feature-discovery”.

206 views
Skip to first unread message

Connor Doyle

unread,
Jul 26, 2016, 12:23:28 PM7/26/16
to kubernetes-dev, David Oppenheimer, Balaji Subramaniam, Brandon Philips

Related to issue https://github.com/kubernetes/kubernetes/issues/28311, we’d like to add a repository for a node feature discovery program capable of enumerating hardware features and advertising them for scheduling. Initially, “binary” features (present or not, such as instruction set extensions) would be advertised as simple labels.


The project skeleton is here: https://github.com/davidopp/node-feature-discovery per the new procedure outlined in this recent thread (https://groups.google.com/forum/#!topic/kubernetes-dev/o6E1u-orDK8).


Note: the skeleton repo is “owned” by David Oppenheimer (Google). David is helping us to comply with corporate policies for open-sourcing things. Intel has a corporate CLA in place; after the project is transferred we will be able to commit our existing code.


Initially, proposed owners are:

  • Balaji Subramaniam (@balajismaniam), Intel
  • Connor Doyle (@ConnorDoyle), Intel

Thanks,

--

Connor

Connor Doyle

unread,
Aug 1, 2016, 12:26:44 PM8/1/16
to kubernetes-dev, davi...@google.com, balaji.su...@intel.com, brandon...@coreos.com
Quick follow-up: added links to related proposals/design docs per feedback from Brandon Philips (CoreOS). As a next and final step, cross-posting to sig-node and adding a bullet to the next sig-node agenda.
--
Connor

Brandon Philips

unread,
Aug 2, 2016, 5:08:54 PM8/2/16
to Connor Doyle, kubernetes-dev, davi...@google.com, balaji.su...@intel.com
Hey Connor-

Thanks for navigating the process and being the first ones!

The only other feedback is that people should know who to contact about the project. See the README template: https://github.com/kubernetes/kubernetes-template-project#community-discussion-contribution-and-support. I am also pretty passionate that every Kubernetes project advertises the Code of Conduct as well.

Other than those little nits. LGTM!

Brandon

Vishnu Kannan

unread,
Aug 2, 2016, 5:42:07 PM8/2/16
to Brandon Philips, Connor Doyle, kubernetes-dev, David Oppenheimer, balaji.su...@intel.com, Derek Carr
+Derek Carr

We discussed this briefly in the node sig today. The general consensus was that the scope of this new repo is very broad and it is not clear yet as to how it intersects with existing kubelet & kubernetes functionalities. So before spawning a new repo, a roadmap along with the value this adds to k8s users will help.

--
You received this message because you are subscribed to the Google Groups "kubernetes-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-de...@googlegroups.com.
To post to this group, send email to kuberne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAD2oYtMbxwHqNOTcuhXZFHt7oR6f0QR2qMFey_anLpju1gBamg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

David Oppenheimer

unread,
Aug 2, 2016, 6:08:42 PM8/2/16
to Vishnu Kannan, Brandon Philips, Connor Doyle, kubernetes-dev, Subramaniam, Balaji, Derek Carr
On Tue, Aug 2, 2016 at 2:42 PM, Vishnu Kannan <vis...@google.com> wrote:
+Derek Carr

We discussed this briefly in the node sig today. The general consensus was that the scope of this new repo is very broad and it is not clear yet as to how it intersects with existing kubelet & kubernetes functionalities. So before spawning a new repo, a roadmap along with the value this adds to k8s users will help.

Can you say more about what you'd like to see beyond what's in the design doc?

IMO the scrutiny for adding a new repo shouldn't be as high as adding a new feature to core Kubernetes or getting some out-of-core component included in the standard deployment scripts (i.e. so it runs on every cluster by default).

David Oppenheimer

unread,
Aug 2, 2016, 6:15:44 PM8/2/16
to Brandon Philips, Connor Doyle, kubernetes-dev, Subramaniam, Balaji
On Tue, Aug 2, 2016 at 2:08 PM, Brandon Philips <brandon...@coreos.com> wrote:
Hey Connor-

Thanks for navigating the process and being the first ones!

The only other feedback is that people should know who to contact about the project. See the README template: https://github.com/kubernetes/kubernetes-template-project#community-discussion-contribution-and-support.

Should this list specific people's names and email addresses? or just a SIG? or a mailing list created just for that repo?

Tim Hockin

unread,
Aug 2, 2016, 7:40:26 PM8/2/16
to David Oppenheimer, Vishnu Kannan, Brandon Philips, Connor Doyle, kubernetes-dev, Subramaniam, Balaji, Derek Carr
On Tue, Aug 2, 2016 at 3:08 PM, 'David Oppenheimer' via kubernetes-dev
<kuberne...@googlegroups.com> wrote:
>
>
> On Tue, Aug 2, 2016 at 2:42 PM, Vishnu Kannan <vis...@google.com> wrote:
>>
>> +Derek Carr
>>
>> We discussed this briefly in the node sig today. The general consensus was
>> that the scope of this new repo is very broad and it is not clear yet as to
>> how it intersects with existing kubelet & kubernetes functionalities. So
>> before spawning a new repo, a roadmap along with the value this adds to k8s
>> users will help.
>
>
> Can you say more about what you'd like to see beyond what's in the design
> doc?
>
> IMO the scrutiny for adding a new repo shouldn't be as high as adding a new
> feature to core Kubernetes or getting some out-of-core component included in
> the standard deployment scripts (i.e. so it runs on every cluster by
> default).

I actually think the bar should be high. We don't need 1000 repos
that have not been thought through. Being in the kubernetes or g is
an endorsement of the idea, so we should be clear what the idea is and
what the purpose of the repo is. I have not read the doc yet.
> https://groups.google.com/d/msgid/kubernetes-dev/CAOU1bzcHOZNsQuTGwcN8LcYvP%2Ba9madfeGMqtp7eLvJi535PNQ%40mail.gmail.com.

Connor Doyle

unread,
Aug 2, 2016, 8:43:04 PM8/2/16
to Tim Hockin, David Oppenheimer, Vishnu Kannan, Brandon Philips, kubernetes-dev, Subramaniam, Balaji, Derek Carr
That document is relatively expansive compared to this proposed repo.
This repo only addresses a specific part, described more briefly here:
https://github.com/kubernetes/kubernetes/issues/28311.
--
connor

Derek Carr

unread,
Aug 3, 2016, 12:58:30 PM8/3/16
to Connor Doyle, Tim Hockin, David Oppenheimer, Vishnu Kannan, Brandon Philips, kubernetes-dev, Subramaniam, Balaji
To summarize my questions from sig-node:

1. Is a repo in the kubernetes/* defining a de-facto standard?
2. Is there a mechanism to review existing code prior to a contribution to vet use-cases?
3. Does a repo have a clearly bounded scope to avoid future conflicts (in this case with kubelet)?

For this particular repository, I had no major concern on allowing something to run that would discover the binary presence of a feature on a node, but I do think there was some concern in sig-node to use this pattern to support additional counted resources.  This basically relates to questions #2 and #3.  I think the use cases requested from the sig were more just a list of initial features that we wanted to plan for discovery, and how we anticipated a user of kubernetes to consume those labels without being overwhelmed in the experience.  There were some labels that sounded like they would want to be toggled on/off dynamically, and the question came up if those scenarios required dynamic updates, etc. or if it was more general node maintenance.

I alluded to node-problem-detector as an example of a repo that feels like it has a clearly bound scope.

Thanks,
Derek

David Oppenheimer

unread,
Aug 3, 2016, 3:43:31 PM8/3/16
to Derek Carr, Connor Doyle, Tim Hockin, Vishnu Kannan, Brandon Philips, kubernetes-dev, Subramaniam, Balaji
FWIW I just checked a random machine in Borg and it had 71 labels (and we don't use namespaces). I guess it's a little annoying if you don't already know what you're looking for, but I'm not concerned with the proposal here to add namespaced labels and namespaced opaque integer resources. There's a natural limit to how many you could have under this proposal anyway, as there's a limit to the number of hardware features and resources a machine might have.

Derek Carr

unread,
Aug 3, 2016, 3:50:54 PM8/3/16
to David Oppenheimer, Connor Doyle, Tim Hockin, Vishnu Kannan, Brandon Philips, kubernetes-dev, Subramaniam, Balaji
It's good to know there is a precedent for larger number of labels for Borg.

To be clear:

1. I see no issue adding namespaced labels for binary presence of a feature on a node.
2. I *think* adding additional namespaced opaque integer resources in Node.Status.Capacity or Node.Status.Allocatable without the kubelet's knowledge would be problematic.

I wanted to ensure that if we add a repository, it's scope does not extend to #2 without further discussion.

Thanks,

Connor Doyle

unread,
Aug 3, 2016, 3:56:22 PM8/3/16
to David Oppenheimer, Derek Carr, Tim Hockin, Vishnu Kannan, Brandon Philips, kubernetes-dev, Subramaniam, Balaji
Hi Derek,

I can address #3 a little bit.

As of now, IMO the scope is well-defined and matches your initial description.

> something to run that would discover the binary presence of a feature on a node

As for avoiding future conflicts with the Kubelet, that's harder to
guarantee. We don't know if or when those plans are likely to
materialize. We have to have a little trust that contributors to the
non-core repos are reasonable. When a conflict/overlap occurs, and
there's a migration path onto core functionality of course it makes
sense to deprecate duplicate functionality in external projects like
this one.

The current plan (and code) prefixes labels with the source, so label
collisions shouldn't be a concern.

Toggling features on/off would be handled by re-running the discovery
pod. We don't plan any direct support for the following scenario
initially: a pod declares affinity for a discovered feature, is
scheduled, and then said feature is disabled. The pod isn't evicted,
etc. Advanced cases like that could be interesting, but we wanted to
start with something simple and easy to understand and get feedback
from real users first.
--
connor

Eric Tune

unread,
Aug 3, 2016, 4:11:08 PM8/3/16
to Connor Doyle, David Oppenheimer, Derek Carr, Tim Hockin, Vishnu Kannan, Brandon Philips, kubernetes-dev, Subramaniam, Balaji
If we agreed that node-feature discovery only set labels once at initialization time, then I think that would ensure it could not be used in place of a counted-resource.  


David Oppenheimer

unread,
Aug 3, 2016, 4:15:07 PM8/3/16
to Eric Tune, Connor Doyle, Derek Carr, Tim Hockin, Vishnu Kannan, Brandon Philips, kubernetes-dev, Subramaniam, Balaji
On Wed, Aug 3, 2016 at 1:11 PM, Eric Tune <et...@google.com> wrote:
If we agreed that node-feature discovery only set labels once at initialization time, then I think that would ensure it could not be used in place of a counted-resource.  

I don't think this proposal is useful unless it allows
(1) both binary features (present/absent) and counted resources (BTW we've had an outstanding request for opaque counted resources for a while -- first proposed in #19082)
(2) binary features that can be dynamically enabled and disabled. (I think allowing the capacity of a counted resource to change dynamically is probably too complicated and confusing, but dynamically adding/removing a label seems simple.)

Derek Carr

unread,
Aug 3, 2016, 4:27:14 PM8/3/16
to David Oppenheimer, Eric Tune, Connor Doyle, Tim Hockin, Vishnu Kannan, Brandon Philips, kubernetes-dev, Subramaniam, Balaji
1. The kubelet today believes its the single source of truth for Node.Status.Capacity and Node.Status.Allocatable.
see: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_node_status.go#L352
It does not handle merging external daemons mucking with those two particular node fields well.

2. The moment we schedule counted resources, the kubelet must be involved in ensuring they are fairly consumed and released by pods.  To ensure that capability, we would need to make changes to container runtime interface to ensure custom counted resources are made available, and we would need to firm up the contract for understanding when those counted resources are released (as they may not actually match the pod lifecycle as we have seen in the pod cgroup work with things like tmpfs memory).
Without handling both of the above items, its impractical to claim support on counted resources.  This is why I would like to be clear that in order for its scope to extend to counted resources, it requires deeper sig-node discussions.  Detailed use case discussion for counted resources is more important than the generally well understood labeling of nodes which is why I asked about repo scope.

Thanks,
Derek

Eric Tune

unread,
Aug 3, 2016, 6:30:21 PM8/3/16
to Derek Carr, David Oppenheimer, Connor Doyle, Tim Hockin, Vishnu Kannan, Brandon Philips, kubernetes-dev, Subramaniam, Balaji
Agree with Derek.

Brian Grant

unread,
Aug 4, 2016, 10:34:03 PM8/4/16
to Eric Tune, Derek Carr, David Oppenheimer, Brandon Philips, Tim Hockin, Vishnu Kannan, Connor Doyle, kubernetes-dev, Subramaniam, Balaji
Let's separate 2 issues:

First, the process for creating a new repo. We have people proposing to work on functionality we definitely want. We NEED solutions to both issues. The original github issue was filed more than a year ago. The one attempt to make progress that I'm aware of was aborted last fall. Relevant issues:


Does it make sense for someone to work on this? Absolutely. Should the code go in the main repo? Absolutely not. We are far, far, far beyond the point of our ability to scale the development effort in the main repo. Therefore, the work needs to occur in another repo. The proposed process was followed. It's time to move forward. I agree we don't want 1000 low-quality, half-baked, abandoned repos, but I could easily imagine 100 well maintained repos if we broke out every component, addon, reusable library, tool, etc.

Second, the design details of what will be built and how it will be integrated with Kubernetes. There are legitimate concerns, such as whether/how other architectures would be supported, and how this would interface with Kubelet. Probably we need a plugin mechanism in Kubelet or local API or something -- it needs to be possible for users to extend the stock Kubernetes release without rebuilding it. I expressed some additional concerns -- unlike David Opp, I think 70+ labels would be a problem, so I would like the user to provide a whitelist. However, we don't need answers to all of these questions prior to creation of the repo. It will take time to set up testing, the submit queue, etc. Let's start small and iterate. Connor agreed to discuss the design further in SIG node and with the main stakeholders. 

Would an incubation process make sense? Maybe, but we don't have one now. Repos are pretty easy to move if we decide that such a process would be appropriate for this effort.

Agree with Derek.


>>> >>>> To post to this group, send email to

>>> >>>> To view this discussion on the web visit
>>> >>>>
>>> >>>> https://groups.google.com/d/msgid/kubernetes-dev/CAD2oYtMbxwHqNOTcuhXZFHt7oR6f0QR2qMFey_anLpju1gBamg%40mail.gmail.com.
>>> >>>>
>>> >>>> For more options, visit https://groups.google.com/d/optout.
>>> >>>
>>> >>>
>>> >>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups
>>> >> "kubernetes-dev" group.
>>> >> To unsubscribe from this group and stop receiving emails from it, send
>>> >> an
>>> >> email to kubernetes-dev+unsubscribe@googlegroups.com.
>>> >> To post to this group, send email to kubernetes-dev@googlegroups.com.

>>> >> To view this discussion on the web visit
>>> >>
>>> >> https://groups.google.com/d/msgid/kubernetes-dev/CAOU1bzcHOZNsQuTGwcN8LcYvP%2Ba9madfeGMqtp7eLvJi535PNQ%40mail.gmail.com.
>>> >>
>>> >> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>>> --
>>> connor
>>
>>
>



--
connor

--
You received this message because you are subscribed to the Google Groups "kubernetes-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "kubernetes-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAASt_VH_VuvS-7bzSXe6xC6fFsMjx2U7Fkkspqza8qqpyLjFzg%40mail.gmail.com.

sebgoa

unread,
Aug 5, 2016, 8:07:36 AM8/5/16
to Brian Grant, Eric Tune, Derek Carr, David Oppenheimer, Brandon Philips, Tim Hockin, Vishnu Kannan, Connor Doyle, kubernetes-dev, Subramaniam, Balaji

> On Aug 5, 2016, at 4:34 AM, 'Brian Grant' via kubernetes-dev <kuberne...@googlegroups.com> wrote:
>
> Let's separate 2 issues:
>
> First, the process for creating a new repo. We have people proposing to work on functionality we definitely want. We NEED solutions to both issues. The original github issue was filed more than a year ago. The one attempt to make progress that I'm aware of was aborted last fall. Relevant issues:
>
> https://github.com/kubernetes/kubernetes/issues/9044
> https://github.com/kubernetes/kubernetes/issues/11470#issuecomment-124283056
> https://github.com/kubernetes/kubernetes/pull/13524
> https://github.com/kubernetes/kubernetes/issues/19082
>
> Does it make sense for someone to work on this? Absolutely. Should the code go in the main repo? Absolutely not. We are far, far, far beyond the point of our ability to scale the development effort in the main repo. Therefore, the work needs to occur in another repo. The proposed process was followed. It's time to move forward. I agree we don't want 1000 low-quality, half-baked, abandoned repos, but I could easily imagine 100 well maintained repos if we broke out every component, addon, reusable library, tool, etc.
>
> Second, the design details of what will be built and how it will be integrated with Kubernetes. There are legitimate concerns, such as whether/how other architectures would be supported, and how this would interface with Kubelet. Probably we need a plugin mechanism in Kubelet or local API or something -- it needs to be possible for users to extend the stock Kubernetes release without rebuilding it. I expressed some additional concerns -- unlike David Opp, I think 70+ labels would be a problem, so I would like the user to provide a whitelist. However, we don't need answers to all of these questions prior to creation of the repo. It will take time to set up testing, the submit queue, etc. Let's start small and iterate. Connor agreed to discuss the design further in SIG node and with the main stakeholders.
>
> Would an incubation process make sense? Maybe, but we don't have one now. Repos are pretty easy to move if we decide that such a process would be appropriate for this effort.
>

At the risk of hijacking this thread a bit, there are open source groups/foundations with governance and processes in place to deal with things similar to this.

For example, Apache Software Foundation:

- Incubating projects
—> projects who want to join submit a proposal, proposal gets voted on, project get incubated if proposal passes. Projects get mentors. If viable, incubated project get elevated.
—> projects have a management committee and vote on their “committers"

- Adding members
—> general speaking, membership in a project is by merit. Membership means write access to repo.
—> membership is proposed then voted on privately.

I haven’t looked, but I would think that CNCF/LF with its other projects has boilerplate governance processes for that sort of things that we could edit and adopt.

cheers,

-sebastien

Brian Grant

unread,
Aug 5, 2016, 10:03:01 AM8/5/16
to sebgoa, Eric Tune, Derek Carr, David Oppenheimer, Brandon Philips, Tim Hockin, Vishnu Kannan, Connor Doyle, kubernetes-dev, Subramaniam, Balaji


On Fri, Aug 5, 2016 at 5:07 AM, sebgoa <run...@gmail.com> wrote:
Thanks, Sebastien. We will be looking at practices of other open-source projects and foundations, such as Docker, Apache, Openstack, etc. Some details will necessarily be different due to practicalities of working in github. For instance, Github is designed for small repos managed by small teams, so some repos might be full-blown "projects" and others may not really be. 

Connor Doyle

unread,
Aug 10, 2016, 2:44:29 PM8/10/16
to Kubernetes developer/contributor discussion, run...@gmail.com, et...@google.com, dec...@redhat.com, davi...@google.com, brandon...@coreos.com, tho...@google.com, vis...@google.com, conno...@gmail.com, balaji.su...@intel.com
After discussion this week with Vish and Derek from sig-node, we came to a general agreement about the scope work for this repository.

We've captured the scope in this doc: https://goo.gl/Oj53AB

TL;DR: Initially the repository will deal only with discovering binary (there-or-not) features and advertise them as "alpha" labels. Any future expansions of the scope will happen only after appropriate discussions with the relevant sig teams (node and scheduling).

Hopefully, that addresses the majority of concerns raised so far in this thread.

One part of the process that remains a bit unclear is how to make a decision on admission/denial of the new repo request. If there are no objections, how about democracy? Within 48 hours (by noon on Friday) please respond with +1 or -1 (rationale).

Balaji Subramaniam

unread,
Aug 10, 2016, 2:59:30 PM8/10/16
to Kubernetes developer/contributor discussion, run...@gmail.com, et...@google.com, dec...@redhat.com, davi...@google.com, brandon...@coreos.com, tho...@google.com, vis...@google.com, conno...@gmail.com, balaji.su...@intel.com
+1 vote for admission of the new repo request.

Dawn Chen

unread,
Aug 10, 2016, 3:47:42 PM8/10/16
to Connor Doyle, Kubernetes developer/contributor discussion, run...@gmail.com, Eric Tune, Derek Carr, David Oppenheimer, Brandon Philips, Tim Hockin, Vishnu Kannan, Subramaniam, Balaji
The modified scope LGTM. 

+1 for new repo request. 

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-dev@googlegroups.com.

Derek Carr

unread,
Aug 10, 2016, 3:48:59 PM8/10/16
to Dawn Chen, Connor Doyle, Kubernetes developer/contributor discussion, run...@gmail.com, Eric Tune, David Oppenheimer, Brandon Philips, Tim Hockin, Vishnu Kannan, Subramaniam, Balaji
+1

To post to this group, send email to kuberne...@googlegroups.com.

Vishnu Kannan

unread,
Aug 10, 2016, 4:49:55 PM8/10/16
to Derek Carr, Dawn Chen, Connor Doyle, Kubernetes developer/contributor discussion, Sebastien Goasguen, Eric Tune, David Oppenheimer, Brandon Philips, Tim Hockin, Subramaniam, Balaji
+1! Thanks for posting the revised proposal!

David Oppenheimer

unread,
Aug 19, 2016, 2:57:34 PM8/19/16
to Kubernetes developer/contributor discussion, davi...@google.com, balaji.su...@intel.com, brandon...@coreos.com
LGTM

Connor Doyle

unread,
Aug 23, 2016, 2:10:16 PM8/23/16
to Kubernetes developer/contributor discussion, davi...@google.com, balaji.su...@intel.com, brandon...@coreos.com
Status update:
  • David Oppenheimer has agreed to be the Champion for this project.
  • Dawn Chen has agreed to be the Sponsor for this project.
The champion and sponsor roles are explained in the incubator FAQ: https://github.com/kubernetes/community/blob/master/incubator.md#faq

Pending the repo move and permissions, we are underway.
Thanks everyone for your time reviewing and setting up the incubation process.
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages