Sorry for the late reply, but maybe those answers might still be helpful for you:
* Knative's KPA scale based on concurrent HTTP requests by default directly reflects your application's unsage. The HPA, by default, is scaled on memory and CPU, which is an indirect metric correlated to the traffic, but not necessarily 1:1 and varying over time. Also, the KPA can scale from zero without losing requests. The HPA can't do that.
* You are right, Knative itself can't scale your cluster, but it helps you to increase the application density if your application's traffic shape is very non-uniform and when you can scale down applications to zero. I.e., you can deploy more applications on the same cluster if you allow them to scale down to zero. An application that does not server any requests should not consume any resources. You can combine Knative with cluster-autoscaling  to optimize your operational costs directly.
I don't know if you can scale down a cluster to zero with the cluster-autoscaler, but I doubt it. An alternative for you would be to run one of the hyperscaler offerings for Knative, notably IBM Code Engine or Google CloudRun. Those managed services offer you full "pay-as-you" to go with the simplified Knative application model without worrying about clusters.