Document shared with you: "[PUBLIC] Kubernetes LLM Inference Autoscaling Examples"
884 views
Skip to first unread message
Clayton Coleman (via Google Docs)
unread,
Jun 5, 2024, 10:41:52 AM6/5/24
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to wg-se...@kubernetes.io
Clayton Coleman shared a document
Clayton Coleman (clayton...@google.com) has invited you to edit the following document:
As discussed a few weeks ago, sharing my autoscaling LLM workloads exploration with wg-serving. This covers most of the use cases I'm aware of where we would want Kubernetes to improve (in combination with observability / custom autoscalers / model servers).
Google LLC, 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA You have received this email because clayton...@google.com shared a document with you from Google Docs.