Is this possible?

29 views
Skip to first unread message

Drew Dwyer

unread,
Feb 19, 2025, 11:42:56 PMFeb 19
to projectnessie
I'm experimenting with a Nessie + trino + iceberg implementation.
But i'd like to bulk ingest data to a table using flink, while performing deletes on the same table using trino, and i'd like to run nessie gc every 24h or so and remove the files from storage, all eventually without manual input and without corrupting the table.

Is this at all possible?

I was thinking something along the lines of 
- the ingestion job running on main
- the delete job creates a branch and merges on completion
- the gc job runs routinely, expires snapshots and removes orphan files

am i barking up the wrong tree?

Dmitri Bourlatchkov

unread,
Feb 20, 2025, 12:23:57 AMFeb 20
to Drew Dwyer, projectnessie
Hi Drew,

Nessie ATM does not have content-aware merges [1], so the merge of deletes from a branch is very likely to conflict with ingestion on main (and require manual intervention to resolve).

Side note: please consider our Zulip Chat [2] for discussions :)

Cheers,
Dmitri.


--
You received this message because you are subscribed to the Google Groups "projectnessie" group.
To unsubscribe from this group and stop receiving emails from it, send an email to projectnessi...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/projectnessie/8bc4bede-64d7-4d15-88f3-87ad73cf254cn%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages