I am trying to generate csv file by reading a Delta table using Databricks. The csv file will be used for downstream Systems. While doing this, I want to read all the csv files in an adls directory. Spark generates the _SUCCESS, _committed, _started files along with the partitioned csv files. My downstream etl process is not processing the csv files because of the presence of these 3 files. How can I not generate these files?
I have tried setting these 3 flags on the Databricks cluster.
mapreduce.fileoutputcommitter.marksuccessfuljobs=false
parquet.enable.summary-metadata=false
spark.hadoop.parquet.enable.summary-metadata=false
The above 3 flags didn't work.
Below Flag works but this is overriding the DBIO outputcommitter
spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol # This worked
Is there a way to achieve this with Delta and Databricks?