Hi, I am trying to read a parquet file as delta from the IBM COS, and I am having problems.
Code:
spark = pyspark.sql.SparkSession.builder.appName("MyApp").master('local[*]')\
.config("spark.delta.logStore.class", "org.apache.spark.sql.delta.storage.COSLogStore")\
.config("spark.jars.packages", "io.delta:delta-core_2.12:0.8.0") \
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
.getOrCreate()
hconf = spark._jsc.hadoopConfiguration()
hconf.set('fs.stocator.scheme.list', 'cos')
hconf.set('fs.cos.impl', 'com.ibm.stocator.fs.ObjectStoreFileSystem')
hconf.set('fs.stocator.cos.impl', 'com.ibm.stocator.fs.cos.COSAPIClient')
hconf.set('fs.stocator.cos.scheme', 'cos')
hconf.set('fs.cos.atomic.write', 'true')
hconf.set("fs.cos.service.endpoint", COS_ENDPOINT)
hconf.set("fs.cos.service.iam.api.key",COS_API_KEY_ID)
from delta.tables import *
df_COS = spark.read \
.option("header", "true") \
.option("inferSchema", "true") \
.parquet("cos://bucket.service/")
df_COS.write.format("delta").save("cos://bucket.service/")
Error:
Py4JJavaError: An error occurred while calling o59.save.
: java.util.concurrent.ExecutionException: java.lang.ClassNotFoundException: org.apache.spark.sql.delta.storage.COSLogStore