Hi Kubeflow Community,
We are excited to share a new in-memory caching solution we've been developing, designed to optimize data loading for distributed AI workloads - especially those involving tabular data.
Built on Apache Arrow and DataFusion, this solution enables:
✅ In-memory storage of Apache Iceberg tables.
✅ Efficient sharding across distributed nodes.
✅ High-throughput streaming to GPU-based AI workloads.
We've prepared a KEP and would love your feedback: https://github.com/kubeflow/community/pull/864
Our team also presented this solution at the recent KubeCon + CloudNativeCon Europe in London: https://youtu.be/s4KAe7AtN7s
Regards,
Andrey
Hi Kubeflow Community,
We will be presenting the final overview of the Arrow Data Cache KEP during the upcoming Kubeflow Community Call on July 8th at 8:00 AM PST.
Join us to:
✅ Learn about the latest updates on the Arrow Cache implementation.
✅ Discuss remaining open questions and gather community feedback.
Don’t miss this chance to engage with the contributors and explore this powerful new feature!
Kubeflow Community call: http://bit.ly/kf-meeting-notes
KEP link: https://github.com/kubeflow/community/pull/864