deltalake1.0.1 save throwing error

231 views
Skip to first unread message

charita m

unread,
Mar 21, 2022, 4:41:30 PM3/21/22
to Delta Lake Users and Developers
I am upgrading my pyspark code to deltalake 1.0.1. I am getting the below error when using spark 3.1 version

code causing error:

dfTable.repartition(1).write.mode('overwrite').option('overwriteSchema', 'true').format('delta').save(table_location)

error:

Caused by: org.apache.spark.SparkUpgradeException:
You may get a different result due to the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps
before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar.
See more details in SPARK-31404. You can set spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'LEGACY' to rebase the datetime


I can fix this by adding the below code:

spark.sql("set spark.sql.legacy.parquet.int96RebaseModeInRead=LEGACY")
spark.sql("set spark.sql.legacy.parquet.int96RebaseModeInWrite=LEGACY")

I wanted to know if there is a better option where i dont have set the code to legacy mode.

Thanks,
Charitha

Michael Nacey

unread,
May 24, 2022, 1:00:55 PM5/24/22
to Delta Lake Users and Developers
You can set it to CORRECTED to use Spark 3.x+ mode.
Reply all
Reply to author
Forward
0 new messages