How to not generate _SUCCESS, _committed, _started files when generating csv

2,879 views

Skip to first unread message

sriram.v...@gmail.com

unread,

Feb 4, 2021, 11:06:42 PM2/4/21

to Delta Lake Users and Developers

I am trying to generate csv file by reading a Delta table using Databricks. The csv file will be used for downstream Systems. While doing this, I want to read all the csv files in an adls directory. Spark generates the _SUCCESS, _committed, _started files along with the partitioned csv files. My downstream etl process is not processing the csv files because of the presence of these 3 files. How can I not generate these files?

I have tried setting these 3 flags on the Databricks cluster.

mapreduce.fileoutputcommitter.marksuccessfuljobs=false

parquet.enable.summary-metadata=false

spark.hadoop.parquet.enable.summary-metadata=false

The above 3 flags didn't work.

Below Flag works but this is overriding the DBIO outputcommitter

spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol # This worked

Is there a way to achieve this with Delta and Databricks?

rohit haseja

unread,

Feb 5, 2021, 1:06:14 AM2/5/21

to sriram.v...@gmail.com, Delta Lake Users and Developers

Hey Sriram,

I haven't seen any direct way of doing this but you could use following steps

1. copy the csv file at specified location using dbutils.fs.cp()

2.Once copied and validated the data then using dbutils.fs.rm() remove the folder where all the _success,_started, and _committed files are present.

I am currently using this approach.

Thanks,

Rohit Haseja.

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/babdc1ae-f9d2-4dc9-a7ad-aee16f26a772n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages