Hi Jeremy,
I'm the author of BentoML, happy to clarify those questions from my point of view.
BentoML focuses on turning trained ML models into API servers that are easy to deploy and with good performance. I think KFServing focuses more on the downstream deployment side, with features such as auto-scaling, A/B testing, MAB, monitoring, scale-to-zero, etc.
BentoML overlaps with the
KFServing Model Server component in KFServing. The main differences are: BentoML has micro-batching, offline batch serving, and model management support; KFServing Model Server support Tensorflow V1 HTTP API format and has better integration with other KFServing components such as explainer and canary rollout.
As far as I know, KFServing itself does not provide micro batching. The
tf-serving project does micro-batching and I believe it is being used in KFServing only when deploying TensorFlow models. It would be interesting to see a deeper integration between BentoML and KFServing Model Server if the KFServing team is open to that, would love to chat more.
Best,
Chaoyu