Recap of your meeting with Eto Labs

2 views
Skip to first unread message

Fathom

unread,
Feb 12, 2026, 12:51:52 PM (12 days ago) Feb 12
to Lance Format Devlist
Meeting Purpose Sync on Lance community contributions, releases, and key technical discussions. Key Takeaways - New Release Process: Adopt a Data Fusion-style model where minor releases (e.g., v2.x) are cut from a major release branch (e.g., branch-2.0). This enables faster, safer bug fixes and features for stable versions while main progresses toward the next major release. - Manifest Scaling Strategy: Defer a spec change for manifest scaling. The immediate solution is to use a "composite table" pattern (a meta-table querying many small tables), which requires no format changes. A benchmark analysis will first define the problem's breaking point. - Type System Consolidation: Consolidate Lance's type system by adopting Substrate's logical/physical model. This simplifies user experience by abstracting away Arrow's concrete types (e.g., String vs. LargeString) and aligns with other major databases. - Standardized Versioning: Create a formal proposal to standardize how users specify versions, tags, and branches. This is critical for consistent time travel across all integrations (e.g., Spark, DuckDB) and will prevent the fragmentation seen in other formats like Iceberg. Topics Release Cycle & Process - Problem: The current release process is slow (2 weeks per release) and lacks a clear strategy for delivering urgent bug fixes to stable versions. - Solution: Adopt a Data Fusion-style release model. - When a major release is cut (e.g., v2.0), a dedicated branch is created (branch-2.0). - All subsequent minor releases for that version (e.g., v2.1, v2.2) are cut from this branch via cherry-picked PRs. - This allows main to progress toward the next major version (e.g., v3.0) without blocking stable-version maintenance. - Status: The v2.0.1 RC is blocked by integration test failures in an internal environment, which are being resolved. Dataset Column Statistics - Status: The write-path MVP is complete and in PR review. - Plan: Merge the PR, marking the feature as "experimental" to allow for future breaking changes to the manifest format. - Read-Path Use Cases: - Query Engines: Provide statistics to Spark and Trino for query planning and predicate pushdown. - Scanner Optimization: Use statistics for filter simplification when no secondary index is available. Manifest Size for Large Tables - Problem: The current single-file manifest will become a bottleneck for tables with millions of fragments, impacting performance for operations like opening the table. - Proposed Solution: Implement a two-level manifest structure, similar to Iceberg. - Decision: Defer a spec change. - Rationale: The problem is not yet well-defined. The immediate solution is a "composite table" pattern, which requires no format changes. - Action: Create a benchmark analysis with a 1M-fragment dataset to identify performance bottlenecks and define the problem's scope. Type System Consolidation - Problem: Lance exposes Arrow's concrete types (e.g., String, LargeString), creating user confusion and requiring complex logic in integrations. - Solution: Consolidate the type system by adopting Substrate's logical/physical model. - Logical Type: A single, high-level type (e.g., String). - Physical Type: The underlying Arrow concrete type (e.g., String or LargeString). - Rationale: This simplifies the user experience by mirroring other major databases (Postgres, Snowflake) and avoids introducing a new, custom type system. Standardizing Version, Tag, and Branch References - Problem: There is no standard way for users to specify versions, tags, or branches, leading to inconsistent time travel implementations across integrations (e.g., Spark, DuckDB). - Goal: Define a single, consistent reference specification in the Lance core library to prevent fragmentation. - Proposed Approaches: - Xuanwo: Use a simple heuristic: numbers are versions, non-numbers are tags/branches. This requires enforcing unique names across these reference types. - Jack: Use a prefix (e.g., ref/) to explicitly distinguish references from version numbers. - Action: Xuanwo will create a formal proposal to drive this discussion. Next Steps - Jack: - Create a GitHub discussion to formalize the new release process. - Reply to the "external manifest store" thread to clarify its interaction with the versioning API discussion. - Weston: - Clean up and merge the column statistics PR, adding "experimental" warnings. - Document the read-path follow-up plan for column statistics. - Add a comment to the type system discussion linking to the Substrate model. - Xuanwo: - Create a formal proposal for standardizing version, tag, and branch references. - All: - Create a GitHub issue to track the benchmark analysis for manifest scaling.
FATHOM Get your own FREE AI Meeting Assistant
#1 rated on G2, 5/5, 5000+ reviews
Meeting with Eto Labs
Lance Community Sync
February 12, 2026    49 mins    View Meeting or Ask Fathom
Action Items ✨
Add key takeaway re: release cadence/branching to meeting notes
Jack Ye
Clean up/merge column stats PR; add experimental note; document read-path follow-ups
Weston Pace
Create GitHub issue re: manifest size/fragment scaling
Xuanwo Ding
Add comment to logical-physical type discussion re: Substrate approach
Weston Pace
Reply to external manifest store discussion; then follow up w/ Nathan
Jack Ye
Prepare proposal for unified version/tag/branch reference scheme
Xuanwo Ding
Meeting Summary ✨

Meeting Purpose

Sync on Lance community contributions, releases, and key technical discussions.

Key Takeaways

Topics

Release Cycle & Process

Dataset Column Statistics

Manifest Size for Large Tables

Type System Consolidation

Standardizing Version, Tag, and Branch References

Next Steps

View Meeting →
Ask Fathom!
Ask our AI Assistant for answers and insights. It's ChatGPT for your meetings!
Try Ask Fathom →
Never take notes again. Sign up for Free
🎁 Referral bonus: Sign up now and unlock a free month of Premium for you
Reply all
Reply to author
Forward
0 new messages