Document shared with you: "[PUBLIC] Kubernetes LLM Inference Autoscaling Examples"

884 views
Skip to first unread message

Clayton Coleman (via Google Docs)

unread,
Jun 5, 2024, 10:41:52 AM6/5/24
to wg-se...@kubernetes.io
Clayton Coleman shared a document
Header profile photo
Clayton Coleman (clayton...@google.com) has invited you to edit the following document:
As discussed a few weeks ago, sharing my autoscaling LLM workloads exploration with wg-serving. This covers most of the use cases I'm aware of where we would want Kubernetes to improve (in combination with observability / custom autoscalers / model servers).
[PUBLIC] Kubernetes LLM Inference Autoscaling Examples
Google LLC, 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
You have received this email because clayton...@google.com shared a document with you from Google Docs.
Google
Reply all
Reply to author
Forward
0 new messages