How delta lake cleanup aborted data files

79 views
Skip to first unread message

Jerome Yang

unread,
Nov 26, 2023, 11:30:23 PM11/26/23
to delta...@googlegroups.com
Hi All,

When I check the Transaction Log Protocol(https://github.com/delta-io/delta/blob/master/PROTOCOL.md), I'm curious how to track the in-progress transaction's data files.
If the transaction generates several data files, only a partial of them are uploaded successfully into s3. The transaction should abort and how to recognize these data files during VACUUM?
And how to distinguish these files from im-progress transactions' data files which are also not recorded in delta logs/checkpoint.

It would be appreciated if someone could explain my curious!
Thanks,
Junfeng

Burak Yavuz

unread,
Nov 27, 2023, 10:46:15 AM11/27/23
to Jerome Yang, delta...@googlegroups.com
Vacuum deletes any file that is not tracked in the transaction log that is older than a retention period (by default 7 days and you can't go below 7 days). This assumes that no transaction is going to exceed 7 days. That's how it deletes the files from aborted transactions.

Best,

Burak Yavuz

Software Engineer

Databricks Inc.

bu...@databricks.com

databricks.com



--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/CAJyN2z0f%3D3%3DCjw0GJ7-odmjJTV-u9KyDK-VVCT2bBuGWYjDP-Q%40mail.gmail.com.

Jerome Yang

unread,
Nov 27, 2023, 9:50:26 PM11/27/23
to Delta Lake Users and Developers
Thank you for the clarification, it helps a lot!

Regards,
Junfeng
Reply all
Reply to author
Forward
0 new messages