Streaming: how to manage the delta table meta cache.

46 views
Skip to first unread message

rameshkumar....@gmail.com

unread,
Nov 9, 2023, 5:19:39 PM11/9/23
to Delta Lake Users and Developers
Hi All,
  
I am using delta merge in the Spark streaming job and performing a merge operation. inside each batch method. The issue is that Delta Lake holds onto the table metadata in the cache. Because of that, spark containers are not being released when the application is in an idle state. This is increasing the EMR cost specifically STG cluster.

Can the cache be cleared after every batch completion?
for (id, rdd) in spark.sparkContext._jsc.getPersistentRDDs().items():
    rdd.unpersist()
Is there a recommended approach to deal with this situation or any other effective method to handle it? 

Thanks,
Rameshkumar S

Adam Binford

unread,
Nov 9, 2023, 5:29:51 PM11/9/23
to rameshkumar....@gmail.com, Delta Lake Users and Developers
You can disable caching of the snapshots via the "sparksnapshotCache.storageLevel" config. I added it for this exact reason: https://github.com/delta-io/delta/pull/1000. You can set it to "NONE" to disable caching. Or you can try disk only caching with the shuffle service serving RDDs if that's an option for you. For some streaming jobs, caching the table snapshots isn't very helpful.

Adam

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/608672c8-7c3d-46ef-b0e8-91948ed43b8dn%40googlegroups.com.


--
Adam Binford

Adam Binford

unread,
Nov 9, 2023, 5:30:21 PM11/9/23
to rameshkumar....@gmail.com, Delta Lake Users and Developers
*snapshotCache.storageLevel
--
Adam Binford

rameshkumar subramanian

unread,
Nov 15, 2023, 12:45:55 PM11/15/23
to Adam Binford, Delta Lake Users and Developers
Thank you, Adam Binford. 
Reply all
Reply to author
Forward
0 new messages