Generated Column bug

465 views
Skip to first unread message

ashok kumar

unread,
Aug 18, 2021, 12:56:43 AM8/18/21
to Delta Lake Users and Developers
Dear Delta team
I was doing some experiment with generated column. But its not working as expected. Sorry for being too descriptive. Below is code snippet which i tried

Here
timestamp column is in epoch
ptimestamp column is partition column. Its also in epoch and its value is calculated with some formula dependent on timestamp column

DeltaTable.create(spark)
.tableName("part")
.addColumn("timestamp", LongType)
.addColumn("name",StringType)
.addColumn(DeltaTable.columnBuilder(spark, "ptimestamp")
.dataType(LongType)
.generatedAlwaysAs("CAST((timestamp-timestamp%3600000) as Long)")
.build)
.partitionedBy("ptimestamp")
.execute()

I was able to create table. But i am facing 2 issue as of now.
  1. I am not able to insert data without filling up generated column i.e below command gives error
         spark.sql("INSERT INTO part (timestamp, name) VALUES (1629010822000, 'ashok1')").show
       Executing above gives below error. As i understand generated column should get filled up automatically.
       org.apache.spark.sql.AnalysisException: Cannot write to 'test.part', not enough data columns; target table has 3 column(s) but the inserted data has 2 column(s)
 
2. I added few records(adding value for generated column too). when i query table with timestamp column. As per hidden partitioning, partition column should be part of plan.
     But i don't see partition column included as filter. my sql query is
      explain select * from part where timestamp =1629165660000;
 | == Physical Plan ==
*(1) Filter (isnotnull(timestamp#5283L) AND (timestamp#5283L = 1629165660000))
+- *(1) ColumnarToRow
+- FileScan parquet test.part[timestamp#5283L,name#5284,ptimestamp#5285L] Batched: true, DataFilters: [isnotnull(timestamp#5283L), (timestamp#5283L = 1629165660000)], Format: Parquet, Location: TahoeLogFileIndex[**/test.db/part], PartitionFilters: [], PushedFilters: [IsNotNull(timestamp), EqualTo(timestamp,1629165660000)], ReadSchema: struct<timestamp:bigint,name:string>

However as expected if i query with partition column, partition filter is coming in plan.
explain select * from part where ptimestamp=1629165600000;
| == Physical Plan ==
*(1) ColumnarToRow
+- FileScan parquet test.part[timestamp#5293L,name#5294,ptimestamp#5295L] Batched: true, DataFilters: [], Format: Parquet, Location: TahoeLogFileIndex[****/test.db/part], PartitionFilters: [isnotnull(ptimestamp#5295L), (ptimestamp#5295L = 1629165600000)], PushedFilters: [], ReadSchema: struct<timestamp:bigint,name:string>

Also, i don't see "generated column" command as part of "create table" syntax. Any plan to introduce it in future ?

Shixiong(Ryan) Zhu

unread,
Aug 18, 2021, 12:37:49 PM8/18/21
to ashok kumar, Delta Lake Users and Developers
Hey Ashok,

1. The INSERT INTO issue and CREATE TABLE syntax for generated columns are not available in OSS Delta Lake right now because Apache Spark SQL parser doesn't support them. We will work with the Spark community to try adding these syntax changes to Apache Spark.
2. Generating partition filters automatically is not supported yet.

Best Regards,

Ryan


--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/058090aa-be67-4512-a80b-b28bac61cba0n%40googlegroups.com.

Shixiong(Ryan) Zhu

unread,
Aug 18, 2021, 12:42:54 PM8/18/21
to ashok kumar, Delta Lake Users and Developers
By the way, you can use the following DataFrame APIs to write the data as a workaround for the INSERT INTO issue.

Seq(1629010822000 -> "ashok1")
      .toDF("timestamp", "name")
      .write
      .format("delta")
      .mode("append")
      .saveAsTable("part")

Best Regards,
Ryan

ashok kumar

unread,
Aug 18, 2021, 11:55:08 PM8/18/21
to Delta Lake Users and Developers
Thanks Ryan
It works this way.

Reply all
Reply to author
Forward
0 new messages