knative cold start performance

1,046 views
Skip to first unread message

Chen Lin

unread,
Apr 2, 2021, 4:22:23 PM4/2/21
to Knative Users
Hey, Community,

I am not sure whether I asked the right question in the right place ( please correct me in case).

I am using to use knative to serve my function on both GKE & OCP cluster, I am trying to figure out what is the expected cold start time.
My observation is that the cold start time variances between 2 - 4 seconds.  Is that possible to reduce it under a second?

Here is one example from OCP serverless, it spent more than 3 seconds in the activator, when I looked at the pod start time, it took around 0.5 second to a ready stage. Where did the rest time spend on? ( currently I am using kourier (not the istio)

 3.png2.png
 
When I try to search related topics on knative cold start performances, I found the following threads, but I am still not able to get a conclusion.  It will be very appreciated if someone can clarify it.

  1. https://github.com/knative/serving/issues/2484 (Kubernetes adds 2s to cold-start time)
  2. https://github.com/knative/serving/projects/8#card-15028530 (Performance: Sub-Second Cold Start)
  3. https://github.com/knative/serving/issues/2498 (Prototype local scheduling of pods)
  4. https://github.com/knative/serving/issues/2497 (Remove k8s controlplane performance variance from our cold-start time)
  5. https://docs.google.com/document/d/1ErAFL7dpC0exdGuLrrrWeY2hxjTgmk2Io8-r0_gkAA8/edit#heading=h.h3waqqiykz6c (Local Scheduling)
Thanks and best regards
Chen

vaga...@gmail.com

unread,
Apr 2, 2021, 4:28:21 PM4/2/21
to Knative Users
Which knative version?
0.21 should have some improvements in that space.

Chen Lin

unread,
Apr 2, 2021, 8:35:00 PM4/2/21
to Knative Users
Thank you, I just checked, I am using 0.19 on the OCP, let me see if I can install a new version (since I am using the Redhat serverless operator, I am not sure if I am able to do that)

Chen Lin

unread,
Apr 5, 2021, 9:23:30 PM4/5/21
to Knative Users
Since I am using OCP 4.6 which don't have knative 0.21.0, so I did a test in a GKE cluster,  I used to use 0.18.0 and I upgrade to 0.21.0 and make a comparsion between them.  I am using openshift/hello-openshift as the test image.
Unfortunetely I didn't see a big difference between two versions. ( the number I got from zipkin for the activator-service for cold start)
4.png

Please let me know if any extra things can be done to help the preformance.

On Friday, April 2, 2021 at 1:28:21 PM UTC-7 vaga...@gmail.com wrote:

Markus Thömmes

unread,
Apr 6, 2021, 2:31:43 AM4/6/21
to Knative Users
Heya!

The numbers you shared are just about the ballpark we currently expect for cold-starts unfortunately. The only angle you can potentially optimize here is making sure your own application image is as small and starts up as quickly as possible. Our activator will probe and use it as quickly as it can (even if the K8s control-plane doesn't yet mark it as ready).

If you can't get it as quick as you need it, you might have to resort to setting a minScale of N or set a scale-down-delay that keeps pods around for longer before actually shutting them down.

The Autoscaling WG has made it their goal for 2021 to reduce the cold-start latency. Our angles of going at this include optimizing K8s itself and further optimizations of things on the Knative side. Unfortunately we're not there yet though.

Hope this helps!
Markus

Chen Lin

unread,
Apr 6, 2021, 2:57:32 PM4/6/21
to Knative Users
Hi Markus,

Thank you for sharing those good information to me, I will follow your suggestions and keep watching the progress.

Best regards
Chen
Reply all
Reply to author
Forward
0 new messages