At today's wg-serving Jiaxin and I want to present a proposal for a project that allows multiple LLM use cases (prompts, LoRA adapters) to safely share base models and model servers for higher density and better operational control. We think an LLM gateway-centric API and Envoy OSS implementation is the best way to achieve those benefits, and that the approach is complementary to the higher-level LLM gateway use cases Dan presented several weeks ago as well as to many existing LLM serving ecosystem projects.
and we plan to do a demo of the PoC showing some of the most immediate benefits.
See you at wg-serving!