Structured Streaming Failing with transactional log file not found

206 views
Skip to first unread message

Yasmine Mohammed

unread,
Aug 11, 2022, 6:57:42 PM8/11/22
to Delta Lake Users and Developers
Hi everyone,

I have a production issue (Spark 3.1.0 , Databricks Runtime 9.1) with a multiple structured streaming queries ( notebooks written in Python)
From time to time I will get

 java.io.FileNotFoundException: No file found in the directory: abfss://contai...@storageaccount.dfs.core.windows.net/silver/tablename/_delta_log. If you never deleted it, it's likely your query is lagging behind. Please delete its checkpoint to restart from scratch. To avoid this happening again, you can update your retention policy of your Delta table

My structured streaming queries are simply replicating data from tables in the delta lake to identical tables in another delta lake.

I have nightly maintenance jobs (Vacuum and Optimize).
I have zoomed on the problem to be happening on delta tables that sees little to no data updates. The structured streaming query reading from such tables will always be referencing an old table version and never advances. However, every night when vacuum job runs it will create entries in delta transactional log. Eventually, delta runs its course and delete transaction logs when the log retention time is due (i.e. deleting the transactional log file referenced by the structured query checkpoint).


My way to mitigate this is to reset that particular query and starting from the right table version.

My  question is: Is that typical? or am I missing something and I am causing my own agony?


Thanks,
Yasmine
Reply all
Reply to author
Forward
0 new messages