VARBINARY / LOB limitations

593 views
Skip to first unread message

wburzyns

unread,
Aug 31, 2020, 11:53:54 AM8/31/20
to H2 Database
Hello,

After a commit titled "Issue #2854: Define limits for identifiers, number of columns, etc.", is there any possibility to store objects larger than 1 MB?

While I understand the reasons for the default limit of 1 MB, I'd like to be able to set my own limit. It seems that currently the only way is to change Constants.MAX_STRING_LENGTH in the source code. Would it be possible for the H2 to have an URL option for this?


Regards,
wburzyns

Evgenij Ryazanov

unread,
Aug 31, 2020, 12:02:08 PM8/31/20
to H2 Database
Hello.

For larger objects you normally should use BLOB or CLOB data type in any version of H2 and the most of other DBMS. These data types aren't affected by the new limitations.

wburzyns

unread,
Aug 31, 2020, 1:19:50 PM8/31/20
to H2 Database
As for current master, both 'PreparedStatement.setBytes(..., ...)' and 'PreparedStatement.setBinaryStream(..., ...)' fail with 'Value too long for column "BINARY VARYING"' exception. This happens regardless of whether the column is declared as VARBINARY or BLOB.

Evgenij Ryazanov

unread,
Aug 31, 2020, 10:44:15 PM8/31/20
to H2 Database
PreparedStatement.setBytes() and PreparedStatement.setObject() with byte[] argument pass a BINARY VARYING value to H2 (in some other drivers they may pass a BINARY value, the JDBC specification allows such behavior too). It obliviously can't be used for very long arrays.

PreparedStatement.setBinaryStream() passes a BINARY LARGE OBJECT value to H2 and accepts long streams. Did you really test it? I can't reproduce such exception with this method, if you can, please, provide a complete test case without third-party dependencies.
JDBC also has a Blob interface (Connection.createBlob()) with Blob.setBytes() method where you can pass longer array too.

JDBC specification doesn't specify any adjustments for behavior of its methods depending on the type of assigned column. JDBC provides only possibility to retrieve the expected type of parameter from an application, but it isn't always known, because prepared statement or callable statement may use parameters inside expressions where different types of values are supported. Application must choose the proper method to pass a value by itself. If data type of a passed value doesn't exactly match with type of assigned column, DBMS perform an usual implicit cast.

wburzyns

unread,
Sep 1, 2020, 6:59:59 PM9/1/20
to H2 Database
You're right, I didn't notice that the exception was coming from PreparedStatement.getBytest() later in my code, which I overlooked to adapt to recent changes in H2. Sorry for bothering you.

BTW Are BLOBs stored differently in H2 (as compared to VARBINARY)? Is there a performance difference between them?

Evgenij Ryazanov

unread,
Sep 1, 2020, 10:34:41 PM9/1/20
to H2 Database
Yes, LOB values are stored separately in H2 and in many other DBMS, unlike other values.

When you read a row, all non-LOB values are read, presence of some large value may slow down the whole operation even when it doesn't need to read them. Presence of large LOB value is cheap here, because only its descriptor will be read, but not the actual value. On the other hand, when you need to read a LOB value, a separate read operation is required.

Small LOB values (up to 256 bytes by default, the documentation is outdated) in H2 are stored directly in a row like all other values. You can set MAX_LENGTH_INPLACE_LOB to a larger value if you wish.

wburzyns

unread,
Sep 2, 2020, 3:54:47 AM9/2/20
to H2 Database
Do all transactional protections (including those provided by two-phase commit protocol) apply to BLOBs or are they second-class citizens?

Evgenij Ryazanov

unread,
Sep 2, 2020, 4:15:17 AM9/2/20
to H2 Database
H2 ensures consistence of transactions for LOB values too, they aren't special.

But H2 has a long-standing issue with LOBs:
In-place LOBs aren't affected, however.

wburzyns

unread,
Sep 2, 2020, 5:46:53 AM9/2/20
to H2 Database

I'm on 1.4.199 and I have tens of millions opaque data pieces in a DB instance. Currently they're stored as BINARY and I'm satisfied with both performance and stability (I'm using H2 in embedded mode). Most of these opaque data pieces are below the new limit of 1 MB but a few of them goes well above tens of megabytes. They're also handled well by 199 in my usage scenario.


1) Is an in-place LOB different from VARBINARY? Does it - like a non-in-place LOB - require additional read operation?

2) The LOB issue you mention looks like a stopper to me. Since large BINARY / VARBINARY works fine and do not cause OOM  problems (at least in specific scenarios) would it be possible to add a new URL option that allows customization of the size limit for VARBINARY columns?

Evgenij Ryazanov

unread,
Sep 2, 2020, 6:09:04 AM9/2/20
to H2 Database
1) Is an in-place LOB different from VARBINARY? Does it - like a non-in-place LOB - require additional read operation?
In-place BLOBs are similar to BINARY VARYING and in-place CLOBs are similar to CHARACTER VARYING in terms of their storage, but they use API of LOBs. You can set MAX_LENGTH_INPLACE_LOB to a large enough value, change your data types to BLOB (CLOB) and use the LOB API for them for now. Too large values of MAX_LENGTH_INPLACE_LOB setting may be rejected or silently ignored in the future, but most likely only when the issue 1808 will be fixed, because we need a some workaround for it.
 
2) The LOB issue you mention looks like a stopper to me. Since large BINARY / VARBINARY works fine and do not cause OOM  problems (at least in specific scenarios) would it be possible to add a new URL option that allows customization of the size limit for VARBINARY columns?
We decided to make this limit not configurable. This limit is already very large, usually DBMS have smaller limits for binary and character strings, such as 64K or even 4K.

wburzyns

unread,
Sep 2, 2020, 6:43:27 AM9/2/20
to H2 Database
Thank you for all the clarifications.

I read the discussion in https://github.com/h2database/h2database/issues/1808. The minimal testcase for 1808 reproduction does parallel updates and reads on a LOB-containing row. Am I safe from issue 1808 If I don't do anything like that, i.e. if I have a single connection to an embedded DB and do everything sequentially in that connection?

Evgenij Ryazanov

unread,
Sep 2, 2020, 8:00:15 AM9/2/20
to H2 Database
In this issue references to a LOB values expire too early in multi-threaded application. Applications with a single connection (session) shouldn't be affected, but you can run some own tests to be sure.
Reply all
Reply to author
Forward
0 new messages