Meeting Purpose
Sync on community updates, the v4.0 release, and key design topics.
Key Takeaways
- v4.0 Release Imminent: Release Candidate 3 (RC3) is out for testing. The release is expected in the next 1–2 days, pending final internal review.
- Design Process Refined: Use GitHub Discussions for high-level tracking and Google Docs for detailed design. All Docs must be linked from a Discussion to ensure discoverability.
- FTS Indexing Strategy Decided: The new multi-segment FTS index will be opt-in in v4.0, then become the default in v5.0. This aggressive cadence encourages users to stay current with major releases.
- Security Hardening: Add CodeQL and Zezmor scanners to GitHub Workflows to prevent supply-chain attacks, a critical risk for open-source projects.
Topics
Community & Project Updates
- DuckDB Extension: The LanceDB extension is now a core DuckDB extension, replacing the community version. Users should migrate their code.
- PrestoDB Connector: A community-led connector for PrestoDB has merged. Jianjian Xie (Uber) is POC'ing it for SQL-on-LanceDB use cases.
- Index File Listing: describe indices now lists all files in an index, improving cold search performance and enabling accurate on-disk size reporting.
v4.0 Release Status
- RC3 is available for testing.
- The release process was delayed to include Full-Text Search (FTS) performance improvements.
- RC3 is a snapshot of the main branch as of March 25, 2026.
- No known blockers exist, and internal testing is nearly complete.
Design Process & GitHub Hygiene
- Problem: GitHub Discussions are poor for inline comments, while Google Docs lack discoverability.
- Solution: Use both.
- GitHub Discussions: For high-level tracking, linking to detailed designs.
- Google Docs: For detailed design work with easy inline comments.
- Guidance:
- Link all Google Docs from a GitHub Discussion.
- Use separate threads for distinct topics within a Discussion.
- Use the "Convert to Issue" feature for actionable items.
- Category Cleanup: The "Ideas" category will be kept for early-stage concepts, but the team will improve guidance on when to use it vs. "Design Proposals."
Partition Namespaces vs. Index Segments
- Problem: The need for partition namespaces is being re-evaluated due to the new index segment feature.
- Context:
- Partition Namespaces: Proposed for multi-tenancy and write optimizations (e.g., partition replacement).
- Index Segments: Allow a single table to scale by treating segments as independent, searchable units.
- Conflict: Index segments might make namespaces redundant for scaling. However, they don't solve write optimizations or guarantee tenant-aligned indexes.
- Decision: Jack Ye will raise a GitHub Discussion to get feedback from all stakeholders.
Index Metadata Details
- Problem: Index metadata lacks critical details (e.g., index type), blocking work like Xuanwo's distributed FTS merging.
- Solution: Add more details to the IndexDetails message.
- Proposed Structure (Will Jones):
1. Specification: Defines the index's behavior (e.g., FTS positions).
2. Build Strategy: Runtime parameters (e.g., parallelism, GPU usage).
3. Outputs: Results from the build process (e.g., IVF centroids).
- Decision: The team will discuss offline to finalize the metadata structure.
Distributed FTS Indexing
- Goal: Apply the index segment concept to FTS for distributed, scalable indexing.
- Approach:
- Phase 1: Change the FTS build process to create one segment per partition, without changing the on-disk format.
- Phase 2: Optimize performance and potentially introduce a new on-disk format.
- Compatibility: Old readers will only see the first segment of a multi-segment FTS index.
- Rollout Plan:
- v4.0: New FTS is opt-in via an environment variable.
- v5.0: New FTS becomes the default.
- Rationale: This aggressive cadence encourages users to stay current with major releases, preventing the project from being held back by legacy versions.
GitHub Workflow Security
- Risk: Automated bots are actively scanning open-source repos for GitHub Workflow vulnerabilities.
- Solution: Add security scanners to all repos.
- CodeQL: GitHub's native scanner for workflow definitions.
- Zezmor: A linter that enforces security best practices.
- Benefit: Prevents supply-chain attacks and enforces safe practices like hash pinning.
Next Steps
- Weston Pace: Finalize v4.0 internal testing and initiate the release.
- Jack Ye: Create a GitHub Discussion on Partition Namespaces vs. Index Segments.
- Will Jones: Draft guidance for using GitHub Discussions and Google Docs.
- Kevin Liu: Provide GitHub Workflow definitions for CodeQL and Zezmor.
- All:
- Test v4.0 RC3 and provide feedback.
- Discuss the IndexDetails metadata structure offline.
- Review the Distributed FTS design proposal.
|
|
|
|
|
Action Items ✨
|
|
|
|
|
Meeting Purpose
Sync on community updates, the v4.0 release, and key design topics.
Key Takeaways
Topics
Community & Project Updates
v4.0 Release Status
Design Process & GitHub Hygiene
Partition Namespaces vs. Index Segments
Index Metadata Details
Distributed FTS Indexing
GitHub Workflow Security
Next Steps
|
|
|
|
|
Ask Fathom!
|
|
Ask our AI Assistant for answers and insights. It's ChatGPT for your meetings!
|
|
Try Ask Fathom →
|
|
|
|
|
|
Never take notes again.
Sign up for Free
|
|
🎁 Referral bonus: Sign up now and unlock a free month of Premium for you
|
|
|
|
|
|
|
|