Re: How can Knative Serving help reduce the costs on AWS

Skip to first unread message

Roland Huß

Feb 10, 2023, 6:54:35 AM2/10/23
to Raman Dhingra, Knative Users
Hi Raman,

Sorry for the late reply, but maybe those answers might still be helpful for you:

* Knative's KPA scale based on concurrent HTTP requests by default directly reflects your application's unsage. The HPA, by default, is scaled on memory and CPU, which is an indirect metric correlated to the traffic, but not necessarily 1:1 and varying over time. Also, the KPA can scale from zero without losing requests. The HPA can't do that.

* You are right, Knative itself can't scale your cluster, but it helps you to increase the application density if your application's traffic shape is very non-uniform and when you can scale down applications to zero. I.e., you can deploy more applications on the same cluster if you allow them to scale down to zero. An application that does not server any requests should not consume any resources. You can combine Knative with cluster-autoscaling [1] to optimize your operational costs directly.

I don't know if you can scale down a cluster to zero with the cluster-autoscaler, but I doubt it. An alternative for you would be to run one of the hyperscaler offerings for Knative, notably IBM Code Engine or Google CloudRun. Those managed services offer you full "pay-as-you" to go with the simplified Knative application model without worrying about clusters.

regards ...
... roland

On Thu, Jan 19, 2023 at 2:52 AM Raman Dhingra <> wrote:
This is probably not the right place to ask this question. But I could not find a place where I should ask so writing here.

What is the main benefit of using Knative Serving, how does it really differ from plain HPA? If we use Knative Serving, do we bypass the HPA?

We run an K8S application on top of EC2 instances in production. Does EC2 auto scaling feature has anything to do with K8S's HPA (and/or Knative Serving)

When we configure an EC2 instance, we buy a fixed amount of AWS resources (cpu. memory etc.) for which we have to pay no matter how often we use.

Taken from Knative Serving documentation: It provides automatic scaling for applications to match incoming demand... This is provided by default, by using the Knative Pod Autoscaler (KPA). For example, if an application is receiving no traffic and scale to zero is enabled, Knative Serving scales the application down to zero replicas.

Now if our K8S application (with Knative Serving enabled) is using the EC2 instances behind the scenes, and assume we access the application not even once in a month. So we won't be charged anything Or do we need to pay a minimum fee which corresponds to the amount of cpu/memory configured for the EC2 instances with autoscaling enabled OR some other amount?

Thank you

You received this message because you are subscribed to the Google Groups "Knative Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Evan Anderson

Feb 10, 2023, 1:37:42 PM2/10/23
to Roland Huß, Raman Dhingra, Knative Users
Chiming in with another AWS option -- I believe Knative works with Fargate EKS clusters, so that may help you autoscale cluster capacity. AWS's karpenter cluster autoscaler might also help with matching cluster capacity and instances, though I suspect scaling EC2 kubelet instances will be slower than adding or removing Fargate containers.

Raman Dhingra

Mar 15, 2023, 6:32:36 AM3/15/23
to Knative Users
Hi Roland, Thank you for the detailed response. Things are much clear than before.

Hi Evan, Thank you!

I am sorry for replying late.
Reply all
Reply to author
0 new messages