How to retrieve the consolidated count of each version processed in DELTA TABLES

414 views
Skip to first unread message

ROHIT BANSAL

unread,
Jul 8, 2021, 10:19:30 AM7/8/21
to Delta Lake Users and Developers
Hi All,

We would like to generate some dashboards/reports on top of records processed by Databricks batch job and delta tables batch counts operations:

We want to create view on top of Delta table used with following metadata columns:
  1. jobId
  2. jobName
  3. runId
  4. timestamp
  5. Operation Metrics -> numOutputRows
can you suggest how to write query based on Delta table history and load this matrix in a table for reporting purpose.

Regards,
Rohit
count-delta1.JPG
count-delta.JPG

Jacek Laskowski

unread,
Jul 13, 2021, 5:59:51 AM7/13/21
to ROHIT BANSAL, Delta Lake Users and Developers
Hi Rohit, 

Seems like an easy task and am confused where the catch is.

It seems like you need a select over a history Dataset. You could also read the log files directly. It's a mixture of parquet and json files. Spark supports both formats easily.

Have you tried anything?

Jacek 

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/e9f63069-6eb4-415a-8ece-10b76be13ca8n%40googlegroups.com.

ROHIT BANSAL

unread,
Jul 16, 2021, 2:07:38 AM7/16/21
to Delta Lake Users and Developers
Thanks Jacek for pointing me to history dataset. I missed to read the history schema concept :)
Reply all
Reply to author
Forward
0 new messages