Recap of your meeting with eto.ai

14 views
Skip to first unread message

Fathom

unread,
Jan 15, 2026, 12:43:40 PMJan 15
to Lance Format Devlist
Meeting Purpose Sync on recent progress, the v2.0 release, and upcoming features. Key Takeaways - Lance v2.0 Release Imminent: The release is blocked only by a final fix to the index name PR. Once merged, the release will proceed, starting a 5-day community vote. - New Features Proposed: Several major features were proposed, including a tokenizer plugin architecture to reduce binary size, an index file list for faster cold reads, and a new sub-project for multi-agent memory. - Voting Process Clarified: The community will vote on the v2.0 release first, followed by separate, parallel votes for new features. Adding new Arrow data types requires a vote to ensure client compatibility. - Column Stats MVP Underway: An MVP for collecting column statistics is in active development. The design doc and PR will be shared with the community for review soon. Topics Recent Developments - Lance DuckDB Extension: Enables DuckDB as a query engine for the Lance format. Docs are live on lance.org. - Polaris Integration: Adds Lance namespace support for Polaris lakehouse. Docs and a blog post are available. - HuggingFace Dataset Support: Lance will soon be a supported format for HuggingFace datasets, accessible via the hf:// specifier. - Ongoing PRs: - SQL and geo-indexing updates, including an R3 index. - Continued optimization of the Blob V2 API. - Performance improvements for JSON, FTS, and HNSW. Lance v2.0 Release - Blocker: The index name PR requires a final fix (adding backticks for column names). - Timeline: Once the blocker is merged, the release will proceed, starting a 5-day community vote. - Non-Blocking: A storage options refactor PR is in progress but not required for v2.0. Tokenizer Plugin Architecture - Goal: Reduce binary size by making tokenizers (e.g., Lender, J-Bang for CJK) optional, dynamically loaded plugins. - Mechanism: A C-API allows separate installation of tokenizer packages (e.g., via Python wheels, Maven). - Next Steps: - Finalize the design spec (plugin description, missing behavior, versioning). - Merge the API PR and test for performance regressions. - Remove old conditional compilation flags. - Optimization Tip: Kevin Liu noted that enabling size optimization during binding builds can reduce binary size by ~80%. Index File List Metadata - Proposal: Add a list of relative file paths to the index metadata proto message. - Benefits: - Enables immediate calculation of index size. - Reduces cold read latency by skipping the initial head call to find the footer. - Compatibility: This is a non-breaking change. Older clients will ignore the new field; new clients will handle its absence gracefully. - Action: Will Jones will add details to the design doc and open a vote. Voting Process for Format Changes - New Arrow Data Types: Require a community vote. Rationale: Ensures compatibility across all clients (Python, Java, Node, etc.). - Arrow Extension Types: Do not require a vote if they resolve to already-supported core Arrow types. - WAL Spec Updates: Require a vote. The current draft is marked "experimental" pending a formal vote. Lance Context Sub-Project - Proposal: A new sub-project to provide "gigantic memory" for multi-agent AI applications. - Features: - High-performance storage for multimodal data (images, PDFs, dataframes). - Version control (forking, branching, merging) for debugging and verification. - Rationale for Sub-Project: The project requires deep integration with Lance's storage layer, making the Lance community the ideal maintainers. - Status: An implementation from Uber is being open-sourced. A 3+1 community vote is required to create the sub-project. Column Statistics MVP - Goal: Collect column stats (min/max) during write operations. - Mechanism: 1. Stats are collected per fragment file and stored in its metadata. 2. Compaction aggregates these stats into a separate, authoritative Lance file for the entire table. - Status: An MVP PR is under review. The feature will be opt-in and will not affect the Lance file version. - Action: Weston Pace will share the design doc and PR link with the community. Next Steps - Will Jones: - Fix the index name PR to unblock the v2.0 release. - Add details to the index file list design doc and open a community vote. - Weston Pace: - Share the column stats design doc and PR link with the community. - Community: - Review and vote on the Lance v2.0 release once the blocker is resolved. - Vote on the Lance Context sub-project proposal. - Review and vote on the index file list proposal.
FATHOM Get your own FREE AI Meeting Assistant
#1 rated on G2, 5/5, 5000+ reviews
Meeting with eto.ai
Lance Community Sync
January 15, 2026    41 mins    View Meeting or Ask Fathom
Action Items ✨
Update index-name PR w/ escape; merge; cut 2.0.0 release next week
Will Jones
Open follow-up PR to add conflict-prevention field to create_index
Will Jones
Open vote for tokenizer plugin C-API; merge after perf validation
Will Jones
Measure perf impact of index-files-in-metadata PR
Will Jones
Update index-files-in-metadata proposal w/ relative paths + empty=missing; open vote
Will Jones
Add schema-spec docs re: data type voting
Will Jones
Review column-stats design doc; share PR + doc links in discussion
Weston Pace
Meeting Summary ✨

Meeting Purpose

Sync on recent progress, the v2.0 release, and upcoming features.

Key Takeaways

Topics

Recent Developments

Lance v2.0 Release

Tokenizer Plugin Architecture

Index File List Metadata

Voting Process for Format Changes

Lance Context Sub-Project

Column Statistics MVP

Next Steps

View Meeting →
Ask Fathom!
Ask our AI Assistant for answers and insights. It's ChatGPT for your meetings!
Try Ask Fathom →
Never take notes again. Sign up for Free
🎁 Referral bonus: Sign up now and unlock a free month of Premium for you
Reply all
Reply to author
Forward
0 new messages