Creation of delta table with partitioned data frame

240 views
Skip to first unread message

Dwija D

unread,
Jun 9, 2021, 11:59:01 AM6/9/21
to Delta Lake Users and Developers
Hi 
I am able to create a delta table after writing messages to the delta location:

scala> batchDF.selectExpr("Col_A", Col _B").write.format("delta").mode("overwrite").save("hdfs://hdfs-host:9000/tmp/kafka-message-without-partition-by")

scala>  sql("CREATE TABLE kafka_topic USING DELTA LOCATION 'hdfs://hdfs-host:9000/tmp/kafka-message-without-partition-by'")

However, if i partition the dataframe with Col_A and then try to create the table using spark sql, i am getting the following error:

 scala> batchDF.selectExpr("Col_A", Col _B").write.format("delta").partitionBy("Col_A").mode("overwrite").save("hdfs://hdfs-host:9000/tmp/kafka-message-with-partition-by")

scala>  sql("CREATE TABLE kafka_topic_with_partition_by  USING DELTA LOCATION 'hdfs://hdfs-host:9000/tmp/kafka-message-with-partition-by'")

org.apache.spark.sql.AnalysisException:
You are trying to create an external table `default`.`kafka_topic`
from `hdfs-host::9000/tmp/kafka-message-with-partition-by` using Delta Lake, but there is no transaction log present at
`hdfs://1hdfs-host:9000/tmp/kafka-message-with-partition-by/_delta_log`. Check the upstream job to make sure that it is writing using
format("delta") and that the path is the root of the table.

I found that, the _delta_log folder is not created in the designated path.Tried with freshly created delta path(folders) but the result is same.

Spark: 3.0.2
Delta : 0.7.0

Like to know, what is the root cause for the issue.

Regards

Tathagata Das

unread,
Jun 10, 2021, 12:07:06 AM6/10/21
to Dwija D, Delta Lake Users and Developers
Hello,

Have you set the spark configurations - https://docs.delta.io/0.7.0/delta-batch.html#configure-sparksession

TD

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/f87e0849-5b4a-4bc3-8dc2-f4f2ec291eb5n%40googlegroups.com.

Dwija D

unread,
Jun 10, 2021, 1:56:56 AM6/10/21
to Delta Lake Users and Developers
Hi

I have included the said configuration for delta table while creating spark session in the spark application:

 val spark = SparkSession.builder
          .master("local[*]")
          ...
          ...
          .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
          .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
          .getOrCreate()

 Also while invoking spark shell i am passing the delta table configuration with --conf parameter: 

# spark-shell --master local --deploy-mode "client" --packages io.delta:delta-core_2.12:0.8.0 --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Thanks
Dwija

Tathagata Das

unread,
Jun 10, 2021, 2:52:39 AM6/10/21
to Dwija D, Delta Lake Users and Developers
scala>  sql("CREATE TABLE kafka_topic_with_partition_by  USING DELTA LOCATION 'hdfs://hdfs-host:9000/tmp/kafka-message-with-partition-by'")
 
org.apache.spark.sql.AnalysisException:
You are trying to create an external table `default`.`kafka_topic`
from `hdfs-host::9000/tmp/kafka-message-with-partition-by` using Delta Lake, but there is no transaction log present at
`hdfs://1hdfs-host:9000/tmp/kafka-message-with-partition-by/_delta_log`. Check the upstream job to make sure that it is writing using
format("delta") and that the path is the root of the table.

Follow up questions
1. Are you sure this is the error message? In the SQL command you are creating "kafka_topic_with_partition_by" but the error message is saying "You are trying to create an external table `default`.`kafka_topic`"

2. Can you run Delta 1.0 on  Spark 3.1 to make sure that this is not a bug in older version only?

Also, if you have to use Spark 3.0, then try using Delta 0.8.0

TD

Dwija D

unread,
Jun 10, 2021, 5:24:04 AM6/10/21
to Delta Lake Users and Developers
>1. Are you sure this is the error message? In the SQL command you are creating "kafka_topic_with_partition_by" but the error message is saying "You are trying to create an external table `default`.`kafka_topic`"
That was a typo

> 2. Can you run Delta 1.0 on  Spark 3.1 to make sure that this is not a bug in older version only?
Due to some other library dependencies, it is not  possible to try on Spark 3.1 immediately.

I tried  with Spark 3.0.2 and Delta 0.8.0 and the result being the same.

Thanks
Reply all
Reply to author
Forward
0 new messages