The symlink manifest has not been updated correctly.

119 views
Skip to first unread message

rameshkumar....@gmail.com

unread,
Oct 9, 2023, 5:57:15 PM10/9/23
to Delta Lake Users and Developers
Hi All,

Sometimes, the optimize table function fails to update symlink manifest files properly.
For example, OPTIMIZE operations created the data file on September 30, 2023, but the symlink manifest file still points to old files. 

More details
Data File:
========
data_date=2023-09-27(partition)
part-00000-ff0f4352-36be-466c-bfbb-ce95a5296058.c000.snappy.parquet | September 30, 2023, 13:15:31 (UTC-04:00)

_delta_log
==========
00000000000000000220.json | September 30, 2023, 13:15:31 (UTC-04:00)
operation:"OPTIMIZE"
added   = part-00000-ff0f4352-36be-466c-bfbb-ce95a5296058.c000.snappy.parquet
remove  = part-00000-ed9cd685-e7a5-4ecb-a2b7-22fde9bc5931.c000.snappy.parquet
remove  = part-00000-6e7e4fb7-6eb5-43a7-98aa-b459c1de9e35.c000.snappy.parquet

00000000000000000220.checkpoint.parquet | September 30, 2023, 13:15:33 (UTC-04:00)


_symlink_format_manifest/data_date=2023-09-27/
==========================================
s3://<bucket>/warehouse_delta/transaction_items/data_date=2023-09-27/part-00000-ed9cd685-e7a5-4ecb-a2b7-22fde9bc5931.c000.snappy.parquet
s3://<bucket>/warehouse_delta/transaction_items/data_date=2023-09-27/part-00000-6e7e4fb7-6eb5-43a7-98aa-b459c1de9e35.c000.snappy.parquet

Table Properties: 
==============
|delta.compatibility.symlinkFormatManifest.enabled|true | |delta.deletedFileRetentionDuration |interval 7 days| |delta.logRetentionDuration |interval 7 days| |delta.minReaderVersion |1 | |delta.minWriterVersion |2

Any suggestion? 

Thanks,
Rameshkumar S

Sergey Ivanychev

unread,
Nov 8, 2023, 10:26:35 AM11/8/23
to Delta Lake Users and Developers
I'm facing the same issue. It seems like auto-OPTIMIZE during writes fails to update the manifest. Manifest ends up pointing to the files that no longer exist, causing failures after vacuum.

Rameshkumar, did you manage to resolve this?

понедельник, 9 октября 2023 г. в 23:57:15 UTC+2, rameshkumar....@gmail.com:

Shixiong(Ryan) Zhu

unread,
Nov 8, 2023, 12:04:05 PM11/8/23
to Sergey Ivanychev, Delta Lake Users and Developers
There is a limitation to manifest generation on concurrent writes.
"""
For example, if automatic mode is enabled, concurrent write operations lead to concurrent overwrites to the manifest files. With such unordered writes, the manifest files are not guaranteed to point to the latest version of the table after the write operations complete. Hence, if concurrent writes are expected and you want to avoid stale manifests, you should consider explicitly updating the manifest after the expected write operations have completed.
"""

Best Regards,

Ryan


--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/034c5e55-7039-4332-83f7-1bc7aa973bd1n%40googlegroups.com.

Shixiong(Ryan) Zhu

unread,
Nov 8, 2023, 1:02:14 PM11/8/23
to Sergey Ivanychev, Delta Lake Users and Developers
Feel free to open a ticket at https://github.com/delta-io/delta for the new features you would like to see.

Yep, OPTIMIZE itself can make concurrent writes.

Best Regards,

Ryan


On Wed, Nov 8, 2023 at 9:55 AM Sergey Ivanychev <sergeyi...@gmail.com> wrote:
I think that the manifest is not generated at all during the write — I have exactly one concurrent write to my table but I have autoCompact and optimizeWrite options set to TRUE. I observe that manifests are broken if OPTIMIZE kicked in after the write (from the delta log).

I suppose OPTIMIZE (or auto optimize during writes) don’t play well with manifest generation.
 
On Nov 8, 2023, at 18:47, Sergey Ivanychev <sergeyi...@gmail.com> wrote:

Thanks for the clarification!

Is there a way to update only a portion of manifests (for a set of partitions)? The current generation mechanism (https://docs.databricks.com/en/sql/language-manual/delta-generate.html) seems to only being able to generate manifests for the whole table, which is very expensive if the number of partitions is high.

Sergey Ivanychev

unread,
Nov 8, 2023, 3:13:31 PM11/8/23
to Shixiong(Ryan) Zhu, Delta Lake Users and Developers
Thanks for the clarification!

Is there a way to update only a portion of manifests (for a set of partitions)? The current generation mechanism (https://docs.databricks.com/en/sql/language-manual/delta-generate.html) seems to only being able to generate manifests for the whole table, which is very expensive if the number of partitions is high.
On Nov 8, 2023, at 18:03, Shixiong(Ryan) Zhu <shix...@databricks.com> wrote:

Sergey Ivanychev

unread,
Nov 8, 2023, 3:13:36 PM11/8/23
to Shixiong(Ryan) Zhu, Delta Lake Users and Developers
I think that the manifest is not generated at all during the write — I have exactly one concurrent write to my table but I have autoCompact and optimizeWrite options set to TRUE. I observe that manifests are broken if OPTIMIZE kicked in after the write (from the delta log).

I suppose OPTIMIZE (or auto optimize during writes) don’t play well with manifest generation.
On Nov 8, 2023, at 18:47, Sergey Ivanychev <sergeyi...@gmail.com> wrote:

rameshkumar subramanian

unread,
Nov 9, 2023, 5:05:57 PM11/9/23
to Shixiong(Ryan) Zhu, Sergey Ivanychev, Delta Lake Users and Developers
Thank you for the clarification, Ryan. 
Sergey Ivanychev
 As of now, I am explicitly updating the manifest after optimizing the table. 

Rameshkumar S

Reply all
Reply to author
Forward
0 new messages