CHAR type's trimming of trailing spaces in H2 vs other RDBMS

1,417 views
Skip to first unread message

Lukas Eder

unread,
Jun 23, 2011, 2:31:40 PM6/23/11
to H2 Database
Hello,

I was wondering about H2's behaviour for CHAR types, as documented
here:
http://www.h2database.com/html/datatypes.html#char_type

The documentation says, this type is supported for compatibility with
other databases. To my knowledge, any of these RDBMS show the same
truncation behaviour as H2:

- MySQL (if PAD_CHAR_TO_FULL_LENGTH is not set to true)
- SQLite
- Sybase

But none of these do. For these, a CHAR(3) value seems to always have
3 characters:

- DB2
- Derby
- HSQLDB
- Ingres
- Oracle
- Postgres
- SQL Server

Is there a reason H2 prefers the less popular behaviour of CHAR(n)?
Note, I found this on the roadmap: Compatibility for data type CHAR
(Derby, HSQLDB). Issue 212. Maybe this is the same issue?

Note also, that according to the SQL standard, the correct behaviour
also depends on the collation, which in turn has an effect on ordering
and comparison:
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

Chapter 4.2.1 states:
A collation descriptor includes [...] whether the
collation has the NO PAD or the PAD SPACE
attribute

And then chapter 11.30
<collation definition> ::=
CREATE COLLATION <collation name>
FOR <character set specification>
FROM <collation source> [ <pad attribute> ]

<pad attribute> ::= NO PAD | PAD SPACE

Hope this helps!

Cheers
Lukas

Rami Ojares

unread,
Jun 23, 2011, 3:01:50 PM6/23/11
to h2-da...@googlegroups.com
Clearly the difference between char and varchar in SQL is that char has
always the same length whereas the length of varchar varies.
To me CHAR(N) is just a shorthand to VARCHAR(N) CHECK(LENGTH(COL) = N)
with the added feature in CHAR that if the length is less than N
it is right padded with spaces.

I would like to see a mention in the documentation what is the default
for N if it is not specified.
If I remember correctly for CHAR it is usually 1.
For varchar the default varies between DBMS products often being the
highest allowable N for VARCHAR
but it can be also 1 (I think this is the case in MSSQL) or it could be
even disallowed to use varchar without precision.

With the present day computing systems I think the most logical way
would be to define type TEXT (MySql, Postgres) or STRING (Java)
that only has a system defined upperbound on the length.

Then varchar(n) could be seen as a shorthand for TEXT CHECK(LENGTH(COL)
<= N)
and char as a shorthand for TEXT CHECK(LENGTH(COL) = N) (although this
does not perform the automatic right-padding).

- rami

Thomas Mueller

unread,
Jun 27, 2011, 3:06:01 AM6/27/11
to h2-database
Hi,

For me, anybody that still uses CHAR(..) is doing so for backward
compatibility. I don't see any reason why you should use it in a new
application.

Therefore, I don't plan to spend too much time improving the
compatibility, unless there is an important "real world use case". As
far as I understand, you had a look at the feature more from a
theoretical viewpoint, and not so much as a practical problem in a
real existing application, right?

Regards,
Thomas

Lukas Eder

unread,
Jun 28, 2011, 1:33:10 PM6/28/11
to H2 Database
Hi Thomas,

> Therefore, I don't plan to spend too much time improving the
> compatibility, unless there is an important "real world use case". As
> far as I understand, you had a look at the feature more from a
> theoretical viewpoint, and not so much as a practical problem in a
> real existing application, right?

Absolutely. I can live with either implementation as soon as I
understand the rationale behind the implementation.

Thanks!
Lukas

Thomas Mueller

unread,
Jun 30, 2011, 2:26:05 PM6/30/11
to h2-database
Hi,

The main reason was to save storage space while keeping the
implementation simple. If the spaces are added (padding), then either
it needs to be done when inserting (which means the spaces are stored
in the database file, which takes a lot of space), or the number of
spaces needs to be kept in memory somewhere, in which case the
implementation would be more complex (spaces are returned but not
stored). That's why I choose not to pad the spaces.

Regards,
Thomas

Lukas Eder

unread,
Jun 30, 2011, 5:59:24 PM6/30/11
to H2 Database
Hi Thomas,

I understand this rationale. Surely, it's easier to strip spaces when
inserting them, right away. Since the SQL standard leaves this
question somewhat open, I guess that's a viable implementation.

Thanks for the explanation!
Cheers
Lukas
Reply all
Reply to author
Forward
0 new messages