Facing issues while trying to write a streaming query to S3

646 views
Skip to first unread message

Aryan Poddar

unread,
Jun 15, 2021, 9:27:30 PM6/15/21
to Delta Lake Users and Developers
Hi all,
I'm facing issue while  writing a stream from kafka to s3. I am currently pointing the checkpoint location to a folder in S3. Is it something to do with s3 consistency issues in storing checkpoint files?

Extract from my code:

val query: StreamingQuery = flattenedDf.writeStream
.outputMode("append")
.format("delta")
.trigger(Trigger.ProcessingTime("30 seconds"))
.option("checkpointLocation","s3a://<bucket-name>/_checkpoints")
.start("s3a://<bucket-name>/delta-folder")


However, I am able to write a simple dataframe and then read from the same s3 bucket.
Below code is working fine.
spark.range(6,10).write.format("delta").mode(SaveMode.Overwrite).save("s3a://<bucket-name>/delta-table-test")
spark.read.format("delta").load("s3a://<bucket-name>/delta-table-test").show()


I have set up my spark session in the following way:

val spark: SparkSession = SparkSession.builder.master("local[*]").appName(getClass.getSimpleName)
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.config("spark.delta.logStore.class","org.apache.spark.sql.delta.storage.S3SingleDriverLogStore")
.getOrCreate() val sc= spark.sparkContext
val accessKey = sys.env("AWS_ACCESS_KEY_ID")
val secretKey = sys.env("AWS_SECRET_ACCESS_KEY")
sc.hadoopConfiguration.set("spark.hadoop.fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
sc.hadoopConfiguration.set("fs.s3a.access.key",accessKey)
sc.hadoopConfiguration.set("fs.s3a.secret.key",secretKey)

Aryan Poddar

unread,
Jun 15, 2021, 9:29:22 PM6/15/21
to Delta Lake Users and Developers
Getting the following error:

 ERROR MicroBatchExecution: Query [id = ddb2bdd1-59ce-43ea-8ea4-6d45c35674b1, runId = 62b5fdb3-44fa-4c95-a59b-d469a529cebb] terminated with error
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 8GWGH1SD738FK12H, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: 9TsGyx6EaoENaWZYs0FYFQ+dzNlKprexJg09zK4nau8yYbxr+kI3a5W2bvbCiOchBW+g5iQJGf8=

Mich Talebzadeh

unread,
Jun 16, 2021, 4:07:27 AM6/16/21
to Aryan Poddar, Delta Lake Users and Developers
Your error indicates access issue possibly to your storage bucket

AWS Service: Amazon S3, AWS Request ID: 8GWGH1SD738FK12H, AWS Error Code: AccessDenied, AWS Error Message: Access Denied,

can you check

option("checkpointLocation","s3a://<bucket-name>/_checkpoints")

whether you can access it using the credentials used in Spark and if there is anything there

Normally these are the directories created

commits  metadata  offsets  sources

HTH



   view my Linkedin profile

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



***
This communication is confidential, may be privileged and is meant only for the intended recipient and purpose. No part of this email or any files transmitted with it can be shared, copied, forwarded or published online or offline, without prior written consent of the sender. If you are not the intended recipient, please preserve the confidentiality, delete the e-mail and attachments, if any from your system and inform the sender immediately.  
***

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/d970c894-d82f-461d-a94e-6f64235b033cn%40googlegroups.com.

Aryan Poddar

unread,
Jun 28, 2021, 3:36:52 AM6/28/21
to Delta Lake Users and Developers
Thanks for the clarification.
Apparently I din't have delete permissions on the storage bucket.
The issue is resolved now.
Reply all
Reply to author
Forward
0 new messages