How to make delta table column case-insensitive in string comparison?

1,040 views
Skip to first unread message

Prasad Vaze

unread,
Oct 20, 2021, 3:23:43 PM10/20/21
to Delta Lake Users and Developers
Is there a way to make column values case-insensitive?  we have many delta tables with  string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case. 
Its lot of code change to use upper/lower function on column value compare so looking for alternative

i see a CHECK constraint on delta table column can enforce consistent case value but its too late , i already have mixed case data in tables.   

is there anything similar to sql server collation feature? 

spark.conf.set('spark.sql.caseSensitive', False) does not work as expected (meaning string comparison between mixed case value shows i have 2 different strings) 

Also looked up spark.conf.set('spark.databricks.analyzer.batchResolveRelations', False) in vein
I have tried 7.3LTS and 9.1LTS databricks on azure
  

Yuri Oleinikov

unread,
Oct 20, 2021, 6:44:30 PM10/20/21
to Prasad Vaze, Delta Lake Users and Developers
Hi Prasad
AFAIK spark.sql.caseSensitive used for column names and not values.
I’m not big expert in Spark but i think that applying ‘lower’ method on column might help

Best regards,


On 20 Oct 2021, at 22:23, Prasad Vaze <prasa...@gmail.com> wrote:

Is there a way to make column values case-insensitive?  we have many delta tables with  string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case. 
--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/5ee612fa-2656-4092-9019-76b44e82cf6dn%40googlegroups.com.

Ruslan Dautkhanov

unread,
Oct 20, 2021, 8:04:42 PM10/20/21
to Delta Lake Users and Developers
Spark 3.3 will have case-insensitive value comparison through ILIKE 

Prasad Vaze

unread,
Oct 20, 2021, 9:58:07 PM10/20/21
to Delta Lake Users and Developers
Thanks Ruslan.  The actual implementation details are in PR https://github.com/apache/spark/pull/33919
I researched for 3.3.0 release date but can't find. Do you happen to know?  

@michael , I had thought about upper/lower function for string compare and it involves code change .  But using ilike also involves code change
Reply all
Reply to author
Forward
0 new messages