Hi All,
I am working on one requirement where I need to check whether current date folder exists or not in Processed_Path.
If it exist it should be deleted. I am able to do this
# function to check pathExistence
def path_exists(path):
try:
dbutils.fs.ls(path)
return True
except Exception as e:
if 'java.io.FileNotFoundException' in str(e):
return False
else:
raise
#if path_exists(process_date):
# dbutils.fs.rm(process_date,True)
# print("Path is Exist and got Removed")
#
#else:
#
# print("Path Does Not Exist")
I am able to check the current date folder and delete it. But it is falling to next step when trying to read the process_path by saying below error
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 89.0 failed 4 times, most recent failure: Lost task 1.3 in stage 89.0 (TID 145, 172.21.128.133, executor 0): com.databricks.sql.io.FileReadException: Error while reading file abfss:mnt/root/USA/process_date=2021-12-14/part-00000-1e134b56-b886-96ac-9faf-bc34cfda44df.c000.snappy.parquet. A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement. For more information, see
https://docs.microsoft.com/azure/databricks/delta/delta-intro#frequently-asked-questions
by writing the data in processed_path I am using below command
df.write.mode("overwrite").format("delta").option("mergeSchema", "true").partitionBy("process_date").save(process_delta_path)
but overwrite is not working and it keep writing the new file in same folder.
I want to over write the process_date=2021-12-14 folder and write the new file in it.
can you please let me know where I am missing the thing which help me to achieve my requirement.