How to make delta table column case-insensitive in string comparison?

Prasad Vaze

unread,

Oct 20, 2021, 3:23:43 PM10/20/21

to Delta Lake Users and Developers

Is there a way to make column values case-insensitive? we have many delta tables with string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case.

Its lot of code change to use upper/lower function on column value compare so looking for alternative

i see a CHECK constraint on delta table column can enforce consistent case value but its too late , i already have mixed case data in tables.

is there anything similar to sql server collation feature?

spark.conf.set('spark.sql.caseSensitive', False) does not work as expected (meaning string comparison between mixed case value shows i have 2 different strings)

Also looked up spark.conf.set('spark.databricks.analyzer.batchResolveRelations', False) in vein

I have tried 7.3LTS and 9.1LTS databricks on azure

Yuri Oleinikov

unread,

Oct 20, 2021, 6:44:30 PM10/20/21

to Prasad Vaze, Delta Lake Users and Developers

Hi Prasad

AFAIK spark.sql.caseSensitive used for column names and not values.

I’m not big expert in Spark but i think that applying ‘lower’ method on column might help

Best regards,

On 20 Oct 2021, at 22:23, Prasad Vaze <prasa...@gmail.com> wrote:

Is there a way to make column values case-insensitive? we have many delta tables with string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case.

--
You received this message because you are subscribed to the Google Groups "Delta Lake Users and Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delta-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delta-users/5ee612fa-2656-4092-9019-76b44e82cf6dn%40googlegroups.com.

Ruslan Dautkhanov

unread,

Oct 20, 2021, 8:04:42 PM10/20/21

to Delta Lake Users and Developers

Spark 3.3 will have case-insensitive value comparison through ILIKE

https://issues.apache.org/jira/browse/SPARK-36674

https://issues.apache.org/jira/browse/SPARK-36778

Prasad Vaze

unread,

Oct 20, 2021, 9:58:07 PM10/20/21

to Delta Lake Users and Developers

Thanks Ruslan. The actual implementation details are in PR https://github.com/apache/spark/pull/33919

I researched for 3.3.0 release date but can't find. Do you happen to know?

@michael , I had thought about upper/lower function for string compare and it involves code change . But using ilike also involves code change

Reply all

Reply to author

Forward