VARCHAR comparison and trailing spaces

89 views
Skip to first unread message

Fawzib Rojas

unread,
Oct 9, 2019, 1:26:24 PM10/9/19
to H2 Database
I have been testing H2 and for some reason same strings with trailing spaces are considered different. For Example:

select CASE WHEN 'testing'='testing     ' THEN 'SAME' ELSE 'DIFFERENT' END

H2 returns 'DIFFERENT' on that query, MSSQL does not, which it seems to be the right way. Looking for info on how this should work I found this from the MSSQL Server:

SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, <Comparison Predicate>, General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them. The padding directly affects the semantics of WHERE and HAVING clause predicates and other Transact-SQL string comparisons. For example, Transact-SQL considers the strings 'abc' and 'abc ' to be equivalent for most comparison operations.

The only exception to this rule is the LIKE predicate. When the right side of a LIKE predicate expression features a value with a trailing space, SQL Server does not pad the two values to the same length before the comparison occurs. Because the purpose of the LIKE predicate, by definition, is to facilitate pattern searches rather than simple string equality tests, this does not violate the section of the ANSI SQL-92 specification mentioned earlier.

So maybe there is a setting I'm missing?

JDBC URL: jdbc:h2:tcp://localhost/.\mydatabase

Fawzib Rojas

unread,
Oct 9, 2019, 1:44:22 PM10/9/19
to H2 Database
Ok, found a post (ironically it was MY post), asking the same but about CHAR (not VARCHAR).
The solution was using "SET COLLATION EN STRENGTH PRIMARY"

After doing that before the creation of the table now the comparison between 'testing' and 'testing   ' reports they are the same.
My question is, shouldn't this be the default when you create a new database?


Evgenij Ryazanov

unread,
Oct 9, 2019, 9:06:28 PM10/9/19
to H2 Database
Hello.

The current version of the SQL Standard is SQL:2016 plus corrigenda and additional parts from 2019. Please, don't talk about SQL-92, it is completely outdated, there are six (!) versions that were published after it.

I don't have the very recent version of ISO/IEC 9075-2, but in previous versions in section “Comparison of character strings”, comparison operations are delegated to the collation, and definition of comparison predicate talks about the same. Technically the standard still requires padding, but it should be a character that is not defined in the character set and that is smaller than any character from the character set if collation has NO PAD characteristic. Supported collations are implementation-defined in the Standard, so you can't assume that all databases will work in the same way.

From my point of view PRIMARY collation shouldn't be the default, it is too relaxed for normal use cases, it will break many applications. I already saw such problems when a custom collation were used (for non-English languages) and this strength was accidentally chosen.

Evgenij Ryazanov

unread,
Oct 9, 2019, 9:35:33 PM10/9/19
to H2 Database
In some other use cases such strength is required for normal operation and it is configurable in H2, but only in a new empty database, to change it later database needs to be exported to SQL, edited (with some stream editor) and re-created.
Reply all
Reply to author
Forward
0 new messages