I am upgrading my pyspark code to deltalake 1.0.1. I am getting the below error when using spark 3.1 version
code causing error:
dfTable.repartition(1).write.mode('overwrite').option('overwriteSchema', 'true').format('delta').save(table_location)
error:
Caused by: org.apache.spark.SparkUpgradeException:
You may get a different result due to the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps
before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar.
See more details in SPARK-31404. You can set spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'LEGACY' to rebase the datetime
I can fix this by adding the below code:
spark.sql("set spark.sql.legacy.parquet.int96RebaseModeInRead=LEGACY")
spark.sql("set spark.sql.legacy.parquet.int96RebaseModeInWrite=LEGACY")
I wanted to know if there is a better option where i dont have set the code to legacy mode.
Thanks,
Charitha