How can I choose what kind of node a pod should schedule ?

589 views
Skip to first unread message

陳瑞平

unread,
Apr 2, 2019, 1:44:26 AM4/2/19
to Knative Users
It seems that node selector/node affinity doesn't supported in revision,
are there any workaround here ?

markust...@me.com

unread,
Apr 2, 2019, 2:45:02 AM4/2/19
to Knative Users
Hi,

we're discussing adding selectors/affinity to our spec currently. Could you elaborate on the use-case you're trying to implement and why you need these features specifically for serverless workloads? That'll help us decide whether or not it is valuable to add it and actually needed by users.

See this PR for the corresponding implementation: https://github.com/knative/serving/pull/3467

Cheers,
Markus

陳瑞平

unread,
Apr 2, 2019, 8:30:11 AM4/2/19
to Knative Users
In a GKE cluster composed of preemptible VM / non-preemptible VM,
I want some kind of (fault-tolerant) services goes to preemptible VM, and vice versa.
So this feature is very critical to our services.

陳瑞平於 2019年4月2日星期二 UTC+8下午1時44分26秒寫道:

davi...@gliacloud.com

unread,
Apr 2, 2019, 9:05:50 AM4/2/19
to Knative Users
+1 for use case that mix preemptible / non-preemptible, 
It is also essential while mix GPU / non-GPU vm.



陳瑞平於 2019年4月2日星期二 UTC+8下午8時30分11秒寫道:

markust...@me.com

unread,
Apr 2, 2019, 9:27:51 AM4/2/19
to Knative Users
The crucial bit here is, that Knative managed pods will always be subject to the occasional shutdown at the platforms discretion. Even setting minPods=1 will not guarantee that this one pod will be around forever, it merely guarantees that there's always at least one pod around to serve somewhere. Going into Knative you shouldn't make any assumptions about the runtime of your service's pods and in theory they could be shutdown after each request.

If that vector is critical to your service, I think you'd be better off deploying StatefulSets or plain Deployments that you control fully.

Justin Grayston

unread,
Apr 2, 2019, 9:32:48 AM4/2/19
to markust...@me.com, Knative Users
I'd certainly be interested in being able to glue to given node pool, for GPU and specific machine types. I have a specific use case for video transcoding, where GPUs, and being able to put the video into memory would be a plus!


Justin Grayston

jgra...@google.com

Customer Engineer - Telco, Media & Technology

+44 7384 431081

+44 20 3820 8995

If you received this communication by mistake, please don't forward it to anyone else (it may contain confidential or privileged information), please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thanks



--
You received this message because you are subscribed to the Google Groups "Knative Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to knative-user...@googlegroups.com.
To post to this group, send email to knativ...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/knative-users/37095c54-ce0f-49b9-af57-c5a200c40893%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthew Moore

unread,
Apr 2, 2019, 9:48:14 AM4/2/19
to Justin Grayston, Ryan Gregg, markust...@me.com, Knative Users


> It is also essential while mix GPU / non-GPU vm.

It is not required for this case, it works fine if you just use the resources block.  You do need it to select a particular class of GPU if you have nodepools with different GPUs.



Markus, is there an uber issue tracking this where we're accumulating use cases (perhaps #1816)?  top of mind, I've heard:
  • mixed GPUs
  • mixed Windows/Linux
  • mixed pre-emptible nodes
Are there other cases I'm forgetting?






For more options, visit https://groups.google.com/d/optout.


--
Matthew Moore
Knative @ Google

Ben Browning

unread,
Apr 2, 2019, 9:57:33 AM4/2/19
to Matthew Moore, Justin Grayston, Ryan Gregg, Markus Thömmes, Knative Users
On Tue, Apr 2, 2019 at 9:48 AM 'Matthew Moore' via Knative Users <knativ...@googlegroups.com> wrote:
+Ryan Gregg 

> It is also essential while mix GPU / non-GPU vm.

It is not required for this case, it works fine if you just use the resources block.  You do need it to select a particular class of GPU if you have nodepools with different GPUs.



Markus, is there an uber issue tracking this where we're accumulating use cases (perhaps #1816)?  top of mind, I've heard:
  • mixed GPUs
  • mixed Windows/Linux
  • mixed pre-emptible nodes
Are there other cases I'm forgetting?


In a similar vein, as people start experimenting with Virtual Kubelet (https://github.com/virtual-kubelet/virtual-kubelet) and Knative we'll need to allow node tolerations and selectors in the Knative Service spec if we want to allow scheduling pods to virtual kubelets.

Ben


 

Denis Loginov

unread,
Apr 1, 2020, 4:53:03 PM4/1/20
to Knative Users
I'd also add a use case where we use gVisor runtime for GKE Sandbox, and that requires node selectors/tolerations also in some scenarios (particularly to allow some workloads to run on Sandboxed nodes).
To unsubscribe from this group and stop receiving emails from it, send an email to knativ...@googlegroups.com.

To post to this group, send email to knativ...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/knative-users/37095c54-ce0f-49b9-af57-c5a200c40893%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Knative Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to knativ...@googlegroups.com.


--
Matthew Moore
Knative @ Google

--
You received this message because you are subscribed to the Google Groups "Knative Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to knativ...@googlegroups.com.

Antoine Masselot

unread,
Apr 2, 2020, 10:18:07 AM4/2/20
to Knative Users
Hello Everyone :)

At Scaleway (Serverless Product based on Knative), we have a similar use-case:
- Using gVisor with containerd ONLY for our node pool running user's functions and containers

We needed a way to schedule our users's containers ONLY on these nodes, although Knative Service manifests do not support it right now.

There is a solution native to Kubernetes: Use the PodNodeSelector admission controller. Here is a link to the official documentation: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#podnodeselector
To use this feature:
- Either configure the "PodNodeSelector" admission controller plugin on your kube apiservers if you run Kubernetes on Premise
- Select the "PodNodeSelector" Admission controller plugin on your Platform, if your provider allows you to do so (I believe GKE has options, we are personally using Scaleway Kapsule managed Kubernetes Offer, which provides us both docker/containerd runtimes with gVisor, and admission controller plugins configuration via API cluster and node pools).
- On your Kubernetes namespaces, set the annotation scheduler.alpha.kubernetes.io/node-selector: name-of-node-selector,
- This will automatically set the nodeSelector on your pods running in the namespace.

I believe this solution could be embedded in Knative serving's official documentation.

Denis Loginov

unread,
Apr 2, 2020, 11:41:02 AM4/2/20
to Knative Users
Hi Antoine,

This is amazing if it works, and is almost exactly our use case. However, it seems that currently, Knative Revision spec doesn’t allow runtimeClassName [1] either, which would be necessary to make Knative Service pods run under gVisor. Could you clarify if you found a workaround for that also?

For this and other reasons (such as inability to specify tolerations also), we currently have to resort to using the plain k8s Service/Deployment and set up HPA on our own for this (obviously, this prevents autoscaling based on concurrency and down-to-0, but we can live with that).

Thanks

[1] https://github.com/knative/serving/issues/5306

Antoine Masselot

unread,
Apr 2, 2020, 11:45:48 AM4/2/20
to Knative Users
Hello Again Denis :)

We are currently running knative services with gVisor on our platform thanks to the Kubernetes annotation io.kubernetes.cri.untrusted-workload: "true" which tells Kubernetes and containerd runtime use gVisor.

Here is a sample Knative service:

apiVersion: serving.knative.dev/v1alpha1 # Current version of Knative
kind: Service
metadata:
  name: helloworldtagada-go # The name of the app
  namespace: functions # The namespace the app will use
spec:
  runLatest:
    configuration:
      revisionTemplate:
        spec:
          container:
            #image: gcr.io/knative-samples/helloworld-go # The URL to the image of the app
            image: rg.fr-par.scw.cloud/demotest/hellworld:latest # The URL to the image of the app
            env:
              - name: TARGET # The environment variable printed out by the sample app
                value: "Go Sample v1"
        metadata:
          annotations:
            io.kubernetes.cri.untrusted-workload: "true"
            autoscaling.knative.dev/minScale: "1"
            autoscaling.knative.dev/maxScale: "20"


And here is the result of the dmesg command inside the pod:

kubectl exec -n functions helloworldtagada-go-746xf-deployment-679676799f-kq897 dmesg
Defaulting container name to user-container.
Use 'kubectl describe pod/helloworldtagada-go-746xf-deployment-679676799f-kq897 -n functions' to see all of the containers in this pod.
[   0.000000] Starting gVisor...
[   0.491329] Forking spaghetti code...
[   0.927906] Creating cloned children...
[   1.330012] Waiting for children...
[   1.822066] Checking naughty and nice process list...
[   2.055907] Constructing home...
[   2.418982] Preparing for the zombie uprising...
[   2.718900] Searching for socket adapter...
[   3.161325] Gathering forks...
[   3.220959] Mounting deweydecimalfs...
[   3.514665] Searching for needles in stacks...
[   3.605276] Ready!

Denis Loginov

unread,
Apr 6, 2020, 6:57:12 PM4/6/20
to Antoine Masselot, Knative Users
Hi Antoine,

Thanks for the workarounds - unfortunately, it doesn't look like GKE supports PodNodeSelector admission controller (or at least I was not able to find an answer on how to enable it).

So unfortunately, we'll have to use regular k8s Jobs with PubSub instead of Knative eventing, at least for now.
Best

--
You received this message because you are subscribed to a topic in the Google Groups "Knative Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/knative-users/cRP14_hAybM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to knative-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/knative-users/db175fb4-0b76-4227-bf61-b23be7a40114%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages