Meeting Purpose
Sync on recent progress, upcoming releases, and key technical topics.
Key Takeaways
- v5.0 RC1 is out. The v4.0 release is stable, so we are skipping a patch and moving directly to the v5.0 release candidate.
- New issue labels will improve triage. needs info, ready to implement, and ready for agent will clarify issue status and work assignment.
- StableRowID is experimental. To manage community expectations, we will formally label the feature as experimental and create a roadmap for stabilization, starting with performance benchmarks.
- Iceberg integration is being explored. We are brainstorming how to expose Lance's fragment-based architecture to Iceberg, with a key goal of enabling storage-partitioned joins in Spark.
Topics
Release & Community Updates
- Recent Blog Posts:
- Lance Blob v2 Deep Dive: Covers the new Blob v2 format.
- Lance v2.2 File Format Benchmark: Shows significant performance and compression gains over v2.0 and Parquet.
- Lance Variants vs. JSONB Benchmark: Benchmarks Lance's variant type against Parquet's.
- Hugging Face Hub Guide: Instructions for uploading Lance datasets.
- Community Contributions: Numerous bug fixes and IVF indexing improvements have been merged from community members.
Issue Triage & Workflow
- Problem: The current issue triage process is inefficient, making it hard to find ready-to-work-on tasks.
- Solution: Introduce three new labels to clarify issue status:
- needs info: Blocks work until more details are provided.
- ready to implement: Triaged and approved for human development.
- ready for agent: A self-contained task suitable for automated agents.
Transaction Log Refactor
- Goal: Refactor the transaction log from operation-specific entries to a generic actions manifest.
- Rationale: This change aims to simplify the conflict resolution matrix and make conflict behavior more explicit.
- Next Step: Will Jones will begin work in a few weeks, then invite others to contribute.
StableRowID Status
- Problem: The feature is generating many bug reports (correctness and performance) but is not a core team priority.
- Context: The feature is used in production (e.g., for UI editing and CDC), so simply deprecating it is not an option.
- Decision:
- Formally label StableRowID as experimental to manage community expectations.
- Create a public roadmap for stabilization.
- First Step: Benchmark the performance impact of the StableRowID → physical address translation layer.
Index Segment Pruning
- Goal: Enable pruning of index segments based on pre-filters (e.g., tenant_id='A') to reduce search computation.
- Current State: The new index segment architecture allows parallel search but requires scanning all segments.
- Proposed Solutions:
1. Clustering: Align index segments with physical data layout (e.g., Z-cubes). A pre-filter could then quickly identify relevant fragments and, by extension, the segments to search.
2. Partition Key in Index: Bake the partition key directly into the vector index for more granular pruning.
- Next Step: Jack Ye will create a GitHub Discussion to explore concrete use cases (e.g., multi-tenancy) and evaluate these solutions.
Iceberg Integration
- Problem: Iceberg's file-based model conflicts with Lance's fragment-based architecture, making direct integration difficult.
- Proposed Solutions:
1. Fragment-Aware File Reader: Iceberg sees a single "file" (e.g., dataset_uri#fragment_id). A custom Iceberg file reader then reads the specific fragment from the Lance dataset.
2. Lance Table as Iceberg File: Treat an entire Lance dataset (a directory) as a single "file" within Iceberg. This simplifies the model but requires careful handling of Iceberg's orphan file cleanup.
- Goal: Enable storage-partitioned joins in Spark by exposing Lance's physical data layout (via clustering metadata) to the execution engine.
Next Steps
- Will Jones:
- Create new issue labels (needs info, ready to implement, ready for agent).
- Create a public roadmap and milestone for stabilizing StableRowID.
- Jack Ye:
- Create a GitHub Discussion on index segment pruning, exploring use cases and solutions.
- Manoj Babu:
- Continue experimentation with Iceberg integration and provide updates.
- All:
- Begin benchmarking the performance impact of StableRowID.
|
|
|
|
|
Action Items ✨
|
|
|
|
|
Meeting Purpose
Sync on recent progress, upcoming releases, and key technical topics.
Key Takeaways
Topics
Release & Community Updates
Issue Triage & Workflow
Transaction Log Refactor
StableRowID Status
Index Segment Pruning
Iceberg Integration
Next Steps
|
|
|
|
|
Ask Fathom!
|
|
Ask our AI Assistant for answers and insights. It's ChatGPT for your meetings!
|
|
Try Ask Fathom →
|
|
|
|
|
|
Never take notes again.
Sign up for Free
|
|
🎁 Referral bonus: Sign up now and unlock a free month of Premium for you
|
|
|
|
|
|
|
|