Recap of your meeting with Eto Labs

0 views
Skip to first unread message

Fathom

unread,
Apr 9, 2026, 1:02:58 PM (3 days ago) Apr 9
to Lance Format Devlist
Meeting Purpose Sync on recent progress, upcoming releases, and key technical topics. Key Takeaways - v5.0 RC1 is out. The v4.0 release is stable, so we are skipping a patch and moving directly to the v5.0 release candidate. - New issue labels will improve triage. needs info, ready to implement, and ready for agent will clarify issue status and work assignment. - StableRowID is experimental. To manage community expectations, we will formally label the feature as experimental and create a roadmap for stabilization, starting with performance benchmarks. - Iceberg integration is being explored. We are brainstorming how to expose Lance's fragment-based architecture to Iceberg, with a key goal of enabling storage-partitioned joins in Spark. Topics Release & Community Updates - Recent Blog Posts: - Lance Blob v2 Deep Dive: Covers the new Blob v2 format. - Lance v2.2 File Format Benchmark: Shows significant performance and compression gains over v2.0 and Parquet. - Lance Variants vs. JSONB Benchmark: Benchmarks Lance's variant type against Parquet's. - Hugging Face Hub Guide: Instructions for uploading Lance datasets. - Community Contributions: Numerous bug fixes and IVF indexing improvements have been merged from community members. Issue Triage & Workflow - Problem: The current issue triage process is inefficient, making it hard to find ready-to-work-on tasks. - Solution: Introduce three new labels to clarify issue status: - needs info: Blocks work until more details are provided. - ready to implement: Triaged and approved for human development. - ready for agent: A self-contained task suitable for automated agents. Transaction Log Refactor - Goal: Refactor the transaction log from operation-specific entries to a generic actions manifest. - Rationale: This change aims to simplify the conflict resolution matrix and make conflict behavior more explicit. - Next Step: Will Jones will begin work in a few weeks, then invite others to contribute. StableRowID Status - Problem: The feature is generating many bug reports (correctness and performance) but is not a core team priority. - Context: The feature is used in production (e.g., for UI editing and CDC), so simply deprecating it is not an option. - Decision: - Formally label StableRowID as experimental to manage community expectations. - Create a public roadmap for stabilization. - First Step: Benchmark the performance impact of the StableRowID → physical address translation layer. Index Segment Pruning - Goal: Enable pruning of index segments based on pre-filters (e.g., tenant_id='A') to reduce search computation. - Current State: The new index segment architecture allows parallel search but requires scanning all segments. - Proposed Solutions: 1. Clustering: Align index segments with physical data layout (e.g., Z-cubes). A pre-filter could then quickly identify relevant fragments and, by extension, the segments to search. 2. Partition Key in Index: Bake the partition key directly into the vector index for more granular pruning. - Next Step: Jack Ye will create a GitHub Discussion to explore concrete use cases (e.g., multi-tenancy) and evaluate these solutions. Iceberg Integration - Problem: Iceberg's file-based model conflicts with Lance's fragment-based architecture, making direct integration difficult. - Proposed Solutions: 1. Fragment-Aware File Reader: Iceberg sees a single "file" (e.g., dataset_uri#fragment_id). A custom Iceberg file reader then reads the specific fragment from the Lance dataset. 2. Lance Table as Iceberg File: Treat an entire Lance dataset (a directory) as a single "file" within Iceberg. This simplifies the model but requires careful handling of Iceberg's orphan file cleanup. - Goal: Enable storage-partitioned joins in Spark by exposing Lance's physical data layout (via clustering metadata) to the execution engine. Next Steps - Will Jones: - Create new issue labels (needs info, ready to implement, ready for agent). - Create a public roadmap and milestone for stabilizing StableRowID. - Jack Ye: - Create a GitHub Discussion on index segment pruning, exploring use cases and solutions. - Manoj Babu: - Continue experimentation with Iceberg integration and provide updates. - All: - Begin benchmarking the performance impact of StableRowID.
FATHOM Get your own FREE AI Meeting Assistant
#1 rated on G2, 5/5, 5000+ reviews
Meeting with Eto Labs
Lance Community Sync
April 09, 2026    59 mins    View Meeting or Ask Fathom
Action Items ✨
Add GitHub labels: needs info, ready to implement
Will Jones
Review StableRowID spec PR; post review
Weston Pace
Write StableRowID stabilization roadmap; open issues and create milestone
Will Jones
Open GitHub discussion re: segment pruning w/ concrete multi-tenant use case
Jack Ye
Meeting Summary ✨

Meeting Purpose

Sync on recent progress, upcoming releases, and key technical topics.

Key Takeaways

Topics

Release & Community Updates

Issue Triage & Workflow

Transaction Log Refactor

StableRowID Status

Index Segment Pruning

Iceberg Integration

Next Steps

View Meeting →
Ask Fathom!
Ask our AI Assistant for answers and insights. It's ChatGPT for your meetings!
Try Ask Fathom →
Never take notes again. Sign up for Free
🎁 Referral bonus: Sign up now and unlock a free month of Premium for you
Reply all
Reply to author
Forward
0 new messages