Meeting Purpose
Sync on recent progress, the v2.0 release, and upcoming features.
Key Takeaways
- Lance v2.0 Release Imminent: The release is blocked only by a final fix to the index name PR. Once merged, the release will proceed, starting a 5-day community vote.
- New Features Proposed: Several major features were proposed, including a tokenizer plugin architecture to reduce binary size, an index file list for faster cold reads, and a new sub-project for multi-agent memory.
- Voting Process Clarified: The community will vote on the v2.0 release first, followed by separate, parallel votes for new features. Adding new Arrow data types requires a vote to ensure client compatibility.
- Column Stats MVP Underway: An MVP for collecting column statistics is in active development. The design doc and PR will be shared with the community for review soon.
Topics
Recent Developments
- Lance DuckDB Extension: Enables DuckDB as a query engine for the Lance format. Docs are live on
lance.org.
- Polaris Integration: Adds Lance namespace support for Polaris lakehouse. Docs and a blog post are available.
- HuggingFace Dataset Support: Lance will soon be a supported format for HuggingFace datasets, accessible via the hf:// specifier.
- Ongoing PRs:
- SQL and geo-indexing updates, including an R3 index.
- Continued optimization of the Blob V2 API.
- Performance improvements for JSON, FTS, and HNSW.
Lance v2.0 Release
- Blocker: The index name PR requires a final fix (adding backticks for column names).
- Timeline: Once the blocker is merged, the release will proceed, starting a 5-day community vote.
- Non-Blocking: A storage options refactor PR is in progress but not required for v2.0.
Tokenizer Plugin Architecture
- Goal: Reduce binary size by making tokenizers (e.g., Lender, J-Bang for CJK) optional, dynamically loaded plugins.
- Mechanism: A C-API allows separate installation of tokenizer packages (e.g., via Python wheels, Maven).
- Next Steps:
- Finalize the design spec (plugin description, missing behavior, versioning).
- Merge the API PR and test for performance regressions.
- Remove old conditional compilation flags.
- Optimization Tip: Kevin Liu noted that enabling size optimization during binding builds can reduce binary size by ~80%.
Index File List Metadata
- Proposal: Add a list of relative file paths to the index metadata proto message.
- Benefits:
- Enables immediate calculation of index size.
- Reduces cold read latency by skipping the initial head call to find the footer.
- Compatibility: This is a non-breaking change. Older clients will ignore the new field; new clients will handle its absence gracefully.
- Action: Will Jones will add details to the design doc and open a vote.
Voting Process for Format Changes
- New Arrow Data Types: Require a community vote. Rationale: Ensures compatibility across all clients (Python, Java, Node, etc.).
- Arrow Extension Types: Do not require a vote if they resolve to already-supported core Arrow types.
- WAL Spec Updates: Require a vote. The current draft is marked "experimental" pending a formal vote.
Lance Context Sub-Project
- Proposal: A new sub-project to provide "gigantic memory" for multi-agent AI applications.
- Features:
- High-performance storage for multimodal data (images, PDFs, dataframes).
- Version control (forking, branching, merging) for debugging and verification.
- Rationale for Sub-Project: The project requires deep integration with Lance's storage layer, making the Lance community the ideal maintainers.
- Status: An implementation from Uber is being open-sourced. A 3+1 community vote is required to create the sub-project.
Column Statistics MVP
- Goal: Collect column stats (min/max) during write operations.
- Mechanism:
1. Stats are collected per fragment file and stored in its metadata.
2. Compaction aggregates these stats into a separate, authoritative Lance file for the entire table.
- Status: An MVP PR is under review. The feature will be opt-in and will not affect the Lance file version.
- Action: Weston Pace will share the design doc and PR link with the community.
Next Steps
- Will Jones:
- Fix the index name PR to unblock the v2.0 release.
- Add details to the index file list design doc and open a community vote.
- Weston Pace:
- Share the column stats design doc and PR link with the community.
- Community:
- Review and vote on the Lance v2.0 release once the blocker is resolved.
- Vote on the Lance Context sub-project proposal.
- Review and vote on the index file list proposal.