Hi,
I've been trying to pinpoint the root cause of multiple executions of flatMapGroupsWithState's stateUpdateFunc when used with DeltaTable.merge for a few weeks already and have still no clue why it works this way. Has anyone noticed or heard of a similar case before (and perhaps fixed)?
I use the latest and greatest of Spark 3.0.1 and Delta Lake 0.7.0 (but confirmed it's in 2.4.7 and 0.6.1).
I initially thought that it only happens with streaming queries, but turns out it shows up with batch queries too.
>>> Using Spark 3.0.1
Create an empty Delta table at /tmp/yurii-delta-double-metrics
>>> >>> stateUpdateFunc executed -> key=1, values: WrappedArray(1, 2)
>>> >>> >>> 3. no earlier state
>>> >>> stateUpdateFunc executed -> key=1, values: WrappedArray(1, 2)
>>> >>> >>> 3. no earlier state
Why are "3. no earlier state" printed twice for the key 1? How to narrow it down? Is this a known issue in Delta? Please help.
Pozdrawiam,
Jacek Laskowski
----