Formating of SQL value for BLOB types (as hex vs ascii-bytes)

614 views
Skip to first unread message

Ken Johanson

unread,
Feb 23, 2010, 1:59:47 AM2/23/10
to h2-da...@googlegroups.com
Hi,

When supplying binary/literal values in raw SQL (not using PreparedStatement), the H2 driver or database attempts parse them as a hex value; 'abcd' is parsed as 0xAB, 0xCD instead of 0x61 0x62 0x63 0x64.

However as I recall the SQL spec saya literal binary data should be one-to-one ascii/iso8859-1 (0-255) char-equivs (including \0); the only exception being 0x27, which must be doubled of course..

My question is, is there any way now (URL argument etc) to switch into the ascii-encoded binary mode? Which would also behave like to PG and Mysql...

(please pardon that this topic has been discussed perhaps 1 year ago however I am unable to find it in the list (do topics expire?)... I am also hoping a solution is now available in H2)

Thanks,
Ken

Thomas Mueller

unread,
Feb 26, 2010, 9:50:36 AM2/26/10
to h2-da...@googlegroups.com
Hi,

> When supplying binary/literal values in raw SQL (not using
> PreparedStatement), the H2 driver or database attempts parse them as a hex
> value; 'abcd' is parsed as 0xAB, 0xCD instead of 0x61 0x62 0x63 0x64.

Yes. Why don't you use a PreparedStatement?

> However as I recall the SQL spec saya literal binary data should be
> one-to-one ascii/iso8859-1 (0-255) char-equivs (including \0); the only
> exception being 0x27, which must be doubled of course..

That's interesting. Could you provide a link? I didn't find that in
the standard.

Unicode doesn't map all byte sequences to characters. Therefore you
couldn't generate some binary data when using UTF-8. Mapping one
character to one byte sounds wrong.

> My question is, is there any way now (URL argument etc) to switch into the
> ascii-encoded binary mode? Which would also behave like to PG and Mysql...

No. I suggest to use a PreparedStatement.

By the way, what almost works is: X'31393836'. At least it seems to be
supported by MySQL, and partially PostgreSQL (but it's a bit array
there, and I didn't find a way to map a bit array to byte array).

Regards,
Thomas

kensystem

unread,
Mar 1, 2010, 1:42:26 PM3/1/10
to H2 Database
Thomas,

I think the issue here is that H2 is treating literals as hex *even
when* there is not an explicit hex encoding indicator (the 'x').

Hex encoding is needed if the data is stored in raw txt files (or
other form which will mangle control chars etc), but not needed at the
API layer. So if there is not encoding hint 'x' then I believe the API
layer should treat literals as (char)byte (ignoring string encoded),
*if* the value applies to a blob type column (matching PG and Mysql).

Answers to your questions:
I don't use a prepared statement because the middleware depends on a
internal conversions for dynamic queries (which would almost never re-
use a prepared statement so in theory would be much, much slower),
and, because the middleware outputs literal values that are supposed
to already be sql compliant.

Also I was not suggesting this applied to UTF-8 -- the *user
application* is responsible for generating byte[] form from strings if
applicable, e.g: insert into tbl (foo) values ('"+escapeSql(new
String(bytea,"iso-8859-1"))+"') where iso-8859-1 is equiv to ascii,
e.g (char)byte.


Again, both Mysql and PG *do* accept (char)byte form (when you don't
hint hex encoding):
-Mysql if you use SET SESSION sql_mode = 'NO_BACKSLASH_ESCAPES,ANSI';
-PG does by default in newer version which no longer treat '\' as
escape char).

Since I no longer have access to sql specs, here are some links that
hopefully describe the two different modes, explicit hex (hinted with
'x') encoding, vs (char)byte:

This shows PG's old encoding ( it is (char)byte, but adds-in the
extraneous 92 which I don think is needed in newer PG, and also 0
which is needed just for the wire protocol as I understand):
http://www.sql.org/sql-database/postgresql/manual/datatype-binary.html

http://www.herongyang.com/jdbc/Oracle-BLOB-SQL-INSERT.html (it does
not say what would happen for each server if using the (char)byte
method (escaping only 0x39)
Oracle, is different and unique becasue it defaults to hex mode
(Oracle is)... H2 is mimicking this but I don't believe it should.

Notably Sun's docs don't seem to mention "binary" and "literal" in the
same (raw sql) context:
http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html

Again I appreciate your suggestion that converting to hex and hint
with 'x' would work, but this is also very inefficient (double hex enc/
dec and memcopy).

So, can the server be changed to accept literals which are not hinted
with 'x', as (char)byte, to behave like PG and Mysql?

Thank you,
Ken

Thomas Mueller

unread,
Mar 2, 2010, 3:16:30 PM3/2/10
to h2-da...@googlegroups.com
Hi,

It doesn't work for me. My test case is:
http://h2database.com/p.html#46cdec5480713988b4daa0f3347499aa

PostgreSQL treats the byte 0 as the end of statement. Also, bytes 92
and 128 - 255 don't work. In MySQL the bytes 92 as well as 128 - 159
don't work. HSQLDB and Apache Derby always throws exceptions.

Could you provide some example code that works for bytes 0 - 255 for
at least for PostgreSQL and MySQL?

Regards,
Thomas

Reply all
Reply to author
Forward
0 new messages