Meeting Purpose
Sync on community proposals, blockers, and the website redesign.
Key Takeaways
- Lance Blob Session API: A new low-level API is proposed to let advanced users register pre-written data files directly, bypassing the high-level builder and enabling custom physical layouts.
- RSC Scaling: The RSC proposal is blocked by three issues: branch merge/rebase functionality, a branch ID system, and a scalable storage model for a large number of branches.
- Manifest Scaling: A "tiered manifest" prototype was presented to scale beyond 1M fragments, but it needs refinement to better handle operations like addColumn and to align with existing code's fragment access patterns.
- Website Redesign: A redesign is underway to modernize
lance.org. The community favors a minimalist, technical style and plans to add a blog for community-contributed content.
Topics
Lance Blob Session API
- Problem: Advanced users (e.g., Geneva) need a low-level API to register pre-written data files directly, bypassing the high-level builder.
- Goal: Enable custom physical layouts and direct use of Lance's transaction API.
- Proposal: A Lance.session API to create dedicated blobs and generate prepared arrays for direct registration.
- Status: PR is open for review.
RSC (Branching) Scaling
- Blockers: The RSC proposal is blocked by three dependencies:
1. Branch Merge/Rebase: Functionality is required to merge arbitrary transactions, failing if too costly or incompatible.
2. Branch ID System: A branch ID (not name) in file paths is needed to enable branch renaming.
3. Scalable Branch Storage: The current storage model is not scalable for a large number of branches.
- Priority: The scalable branch storage model is the highest priority, as it is not currently being addressed. An RFC is needed.
- Dependency: The RSC proposal also requires the branch merging functionality currently being designed by Will and Drew.
Manifest Scaling
- Problem: The current single-file manifest model has scaling limits.
- Fragment Count: A table with 1M fragments (for 1T rows) results in an 80MB manifest, which is manageable.
- Column Count: Operations like addColumn duplicate the manifest, adding ~80MB per column and quickly making the file too large.
- Proposal: A "tiered manifest" prototype using a two-level tree structure.
- A small, bounded root manifest references immutable "child" manifest files.
- Children are sealed and pushed down the tree when a fragment threshold is met.
- Feedback & Refinement:
- Optimize for addColumn: The sealing criteria should be based on data files, not just fragment count, to specifically address the column-adding use case.
- Handle Fragment Changes: The design must define how to handle fragment changes (e.g., replace, overlay, or merge-on-read).
- Survey Access Patterns: Analyze existing code to understand how fragments are accessed (all vs. some) and ensure the new design supports these patterns efficiently.
Website Redesign
- Goal: Modernize
lance.org to better communicate the project's vision and provide clear access to docs and the spec.
- Community Feedback:
- Style: A minimalist, technical, and "brutalist" aesthetic is preferred over a research-paper style.
- Content: The site should feature metrics and benchmarks (e.g., Jack's Delta vs. Iceberg post) to let the data speak for itself.
- New Feature: A blog section will be added for community-contributed content (e.g., Drew's Spark integration post) and release announcements.
Next Steps
- Xuanwo:
- Link the Lance Blob Session API PR in the design doc.
- Share the website design concepts in the Discord channel for feedback.
- Jack:
- Sync with Nathan Ma on the branch ID PR status.
- Create an RFC for the scalable branch storage model.
- Drew:
- Refine the tiered manifest prototype based on community feedback.
- Survey existing code to map fragment access patterns.
- Prashanth:
- Post the website design link and a meeting summary to Discord.