Recap of your meeting with Eto Labs

5 views
Skip to first unread message

Fathom

unread,
Jan 29, 2026, 1:01:31 PM (6 days ago) Jan 29
to Lance Format Devlist
Meeting Purpose Sync on recent progress and discuss key architectural proposals. Key Takeaways - Lance 2.0.0 RC4 is out for PMC vote. This is the final release candidate, containing a fix for a P3 issue. - A new Hugging Face integration enables direct video previews. The lance-format organization on HF Hub now hosts datasets, with a new feature allowing users to preview video frames without a full download. - An "Incubator" proposal will streamline new project creation. This will replace the slow, formal 3-day PMC vote with a faster process, accelerating community growth. - Clustering is the priority for read optimization, not hard partitioning. The strategy is to use clustering with zone map indexes, which can achieve the same read performance as partitioning while offering more flexibility. Topics Lance 2.0.0 Release Candidate - RC4 is out for PMC vote, containing a fix for a P3 issue. - This is expected to be the final release candidate before the 2.0.0 stable release. - Action: Tim Saucer will test RC4 with the rerun project. - Suggestion: Add a downstream project checklist (e.g., Lance Spark, LanceDB) to future release processes for easier verification. New Hugging Face Integration - The lance-format organization on Hugging Face Hub now hosts Lance datasets. - A key new feature: direct video previews on the HF Hub frontend. - Mechanism: The viewer scans the Lance format to retrieve representative frames from video blobs. - Benefit: Enables visual inspection of video data without a full download. - Enabler: The Lance format's ability to retrieve frames by offset ID. - A blog post detailing the integration is planned for next week. Community & Project Updates - New Projects: LanceContext and LanceGraph are now official sub-projects under the main lance-format GitHub organization. - LanceGraph Benchmark: Initial benchmarks show strong performance, with clear areas for optimization. - Recent Merges: - RAID distributed vector index (ByteDance) - Spark vector search features - Distributed FTS index on Spark (Jesse, Netflix) - Mem table and WAL writer/reader updates Logical Types & Schema - Problem: The current schema definition is informal, leading to inconsistent handling of logical types. - Proposed Solution: Formalize the schema spec by separating logical types (for compute) from physical types (for storage). - Storage Layer: Should be pure Arrow. - Engine Layer: Handles logical types (e.g., JSON, StringView) and their mapping to physical storage. - Example: JSON Type - Current: Stored as a LargeBinary (JSONB encoded) with metadata. - Challenge: How to handle user requests for different representations (e.g., raw JSONB vs. decoded string) without inefficient conversions. - Example: Decimal Type - Constraint: The rust_decimal crate supports precision up to 28, while Iceberg requires up to 38. - Resolution: Lance uses Arrow's decimal type, which supports precision up to 38, so this is not a blocker. Governance: Incubator Proposal - Problem: The 3-day PMC voting process for new projects is a bottleneck to community growth. - Proposal: Introduce an "Incubator" stage for new projects. - Process: A PMC member can create a new repo, which starts in an incubating state. - Incubating Projects: Have lower requirements (e.g., maintainers can merge directly) to accelerate development and attract contributors. - Graduation: Requires a formal PMC vote. - Maintainer Privileges: Contributors retain commit access upon graduation. - Project Scope Clarification: - lance-python-docs: Should not be a sub-project; it's a build-related separation, not a user-facing project. - lance-namespace: Should remain a sub-project as it has its own release packages. Manifest Store Formalization - Proposal: Formalize the "external manifest store" concept into a flexible API. - Use Cases: - Vendor Support: Enable custom transaction layers for object storage lacking native atomicity. - Atomic Multi-Table Commits: Allow atomic updates across multiple tables (e.g., for partitioning) by committing to a manifest table. - Catalog Integration: Enable enterprise features like audit logs and downstream job triggers by routing commits through a catalog. Partitioning & Liquid Clustering - Status: No active development. - Strategy: Prioritize liquid clustering over hard, static partitioning. - Rationale: Clustering with zone map indexes provides equivalent read performance while offering more flexibility and avoiding the "wrong partition key" problem. - Exception: Hard partitioning may still be needed for niche use cases like bulk data replacement. - Engine Integration: The challenge is exposing clustering info to engines (e.g., Spark) that expect a static partition interface. - Solution: A Lance interface could translate clustering info into a format engines understand. Metadata as Tables - Suggestion: Expose Lance table metadata (fragments, indexes) as queryable dataframes, similar to Iceberg. - Benefit: Enables DBAs and users to easily inspect table state, run analytics, and understand performance characteristics. - Status: Internal tools already exist; a public API is feasible. - Action: Kevin Liu will create a GitHub issue to track this. Next Steps - PMC Members: Vote on the Lance 2.0.0 RC4 release. - Tim Saucer: Test Lance 2.0.0 RC4 with the rerun project. - Will Jones: Refine the logical types proposal based on feedback, focusing on a clear spec for the storage/compute separation. - Jack Ye: Finalize the "Incubator" proposal for a community vote. - Kevin Liu: Create a GitHub issue to propose exposing Lance metadata as queryable tables.
FATHOM Get your own FREE AI Meeting Assistant
#1 rated on G2, 5/5, 5000+ reviews
Meeting with Eto Labs
Lance Community Sync
January 29, 2026    59 mins    View Meeting or Ask Fathom
Action Items ✨
Publish blog on Hugging Face integration + video preview
Prashanth Rao
Test Lance 2.0.0 RC4; vote on release
Tim Saucer
Test Lance 2.0.0 RC4; vote on release
Kevin Liu
Add downstream verification checklist to Lance 2.0.0 RC4 vote thread
Will Jones
Update logical types spec; address Weston’s comments; then push string/decimal proposals
Will Jones
Remove Lance-Python-Docs from subproject list
Jack Ye
Move manifest store formalization to GitHub Discussions
Prashanth Rao
Review partitioning/liquid clustering threads
Kevin Liu
Open issue re: metadata tables/views (fragments, indexes, files)
Kevin Liu
Meeting Summary ✨

Meeting Purpose

Sync on recent progress and discuss key architectural proposals.

Key Takeaways

Topics

Lance 2.0.0 Release Candidate

New Hugging Face Integration

Community & Project Updates

Logical Types & Schema

Governance: Incubator Proposal

Manifest Store Formalization

Partitioning & Liquid Clustering

Metadata as Tables

Next Steps

View Meeting →
Ask Fathom!
Ask our AI Assistant for answers and insights. It's ChatGPT for your meetings!
Try Ask Fathom →
Never take notes again. Sign up for Free
🎁 Referral bonus: Sign up now and unlock a free month of Premium for you
Reply all
Reply to author
Forward
0 new messages