Proposal for dense LLM serving on Kubernetes via gateway

622 views
Skip to first unread message

Clayton Coleman

unread,
Aug 7, 2024, 11:01:53 AM8/7/24
to wg-se...@kubernetes.io
At today's wg-serving Jiaxin and I want to present a proposal for a project that allows multiple LLM use cases (prompts, LoRA adapters) to safely share base models and model servers for higher density and better operational control.  We think an LLM gateway-centric API and Envoy OSS implementation is the best way to achieve those benefits, and that the approach is complementary to the higher-level LLM gateway use cases Dan presented several weeks ago as well as to many existing LLM serving ecosystem projects.


and we plan to do a demo of the PoC showing some of the most immediate benefits.

See you at wg-serving!
Reply all
Reply to author
Forward
0 new messages