Hi,
I use delta tables for 2 months. I have a delta table on Databricks.
In my pipeline, I add data from parquet files to this delta table. Some new columns could be added to parquet files day by day. I also would like to add these new columns to my delta table. You can find the related code block below and my spark version is 3.1.2.
It completes successfully without error and inserts data to my delta table but without new columns.
val confMergeSchema = "true"
val confDatabricksAutoMergeSchema = "true"
spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled" , confDatabricksAutoMergeSchema)
spark.read
.format("parquet")
.load(pathRead)
.repartition(numFiles)
.write
.format("delta")
.option("mergeSchema", confMergeSchema)
.mode("append")
.option("path", pathWrite)
.partitionBy("a","b","c")
.saveAsTable(s"$dbName.$tableName")
When I change code like below and add mergeSchema option after read it throws exception: "Found duplicate column(s) in the data schema, "
spark.read
.option("mergeSchema", confMergeSchema)
.format("parquet")
.load(pathRead)
.repartition(numFiles)
.write
.format("delta")
.mode("append")
.option("path", pathWrite)
.partitionBy("a","b","c")
.saveAsTable(s"$dbName.$tableName")
What is my missing point?
Regards.
Canan