Data loss with H2 HA / Clustered Mode

Daniel Stone

unread,

Aug 19, 2013, 7:26:03 AM8/19/13

to h2-da...@googlegroups.com

Hi,

We are trying to use the H2 DB after switching from Apache Derby. This main reason for this was because of the clustering support in H2. After a fair amount of testing we have now come across a problem with the clustered mode.

I have configured a system to use H2 (1.3.172) in clustered mode and access the DB using the supplied connection pool. Our application has many threads accessing the DB concurrently (from Web Services and background tasks). The main purpose of the DB is to hold a job queue where the job is added to the table, updated frequently while active and then set to complete. I have found when the system is under load job records can “disappear” from the DB (many jobs being submitted and updates to existing records, ~40 out of 1000 jobs lost). This doesn’t happen when the DB is run in standalone mode.

I don’t know how the clustering mode works internally but I was wondering if the transactions are not making it to one of the DB instances but this instance is then used for reading. Would this be possible? Could it be something I have set-up incorrectly? Does clustering work OK with the connection pool class? We had to switch to sequences for id generation, could I have done something wrong there?

Any help would be much appreciated. I am happy to supply more details if required.

Regards,
Daniel Stone

Noel Grandin

unread,

Aug 22, 2013, 11:07:20 AM8/22/13

to h2-da...@googlegroups.com, Daniel Stone

Don't think I can help much because I don't use clustering, but let's rule out the obvious problems anyhow :-)
ï¿½- what does your DB URL look like?
-ï¿½ are you setting any other DB-level options via SET commands or environment variables?

-- Noel.

Daniel Stone

unread,

Aug 22, 2013, 1:26:17 PM8/22/13

to h2-da...@googlegroups.com, Daniel Stone

Hi Noel,

My connection URL is essentially this: "jdbc:h2:tcp://server1:9092,server2:9092/./folder/dbName"

I have now added ";TRACE_LEVEL_FILE=4" to the URL to switch to SLF4J logging.

We do not use any other DB options via SET commands or environment variables.

Please note that 2 applications are accessing the cluster. I have now started to look at the H2 source code and from what I can see our system is experiencing transient problems when running updates on server1 which then results in the updates only making it to server2. I then think that another connection (as we use a pool) attempts to read the data from server1 but it doesn't exist. It looks like the cluster list is held per connection, is this correct?

Regards,

Dan

Daniel Stone

unread,

Aug 22, 2013, 1:45:03 PM8/22/13

to h2-da...@googlegroups.com, Daniel Stone

Hi Noel,

My connection URL is essentially this: "jdbc:h2:tcp://server1:9092,server2:9092/./folder/dbName"

I have now added ";TRACE_LEVEL_FILE=4" to the URL to switch to SLF4J logging.

We do not use any other DB options via SET commands or environment variables.

Please note that 2 applications are accessing the cluster. I have now started to look at the H2 source code and from what I can see our system is experiencing transient problems when running updates on server1 which then results in the updates only making it to server2. I then think that another connection (as we use a pool) attempts to read the data from server1 but it doesn't exist. It looks like the cluster list is held per connection, is this correct?

Regards,

Dan

On Thursday, 22 August 2013 16:07:20 UTC+1, Noel Grandin wrote:

Noel Grandin

unread,

Aug 23, 2013, 3:20:14 AM8/23/13

to h2-da...@googlegroups.com, Daniel Stone

HI

On 2013-08-22 19:45, Daniel Stone wrote:

My connection URL is essentially this: "jdbc:h2:tcp://server1:9092,server2:9092/./folder/dbName"

OK, so you're not exploring the weirder options, which reduces the problem space considerably.

Please note that 2 applications are accessing the cluster. I have now started to look at the H2 source code and from what I can see our system is experiencing transient problems when running updates on server1 which then results in the updates only making it to server2. I then think that another connection (as we use a pool) attempts to read the data from server1 but it doesn't exist. It looks like the cluster list is held per connection, is this correct?

Yes, your analysis looks pretty correct.
Yes, I suspect that transient failures on one server would cause data out of sync issues.

Regards, Noel.

Daniel Stone

unread,

Sep 5, 2013, 11:15:11 AM9/5/13

to h2-da...@googlegroups.com, Daniel Stone

Hi Noel,

Many thanks for the update. We are in the process of deciding whether to stay with H2 or not. I have a few questions regarding H2 clustering. Do you know if there is any intention to enhance the H2 cluster mode? Would we be able to enhance this feature and submit back to the H2 community?

Regards,

Dan

Noel Grandin

unread,

Sep 5, 2013, 11:17:08 AM9/5/13

to h2-da...@googlegroups.com, Daniel Stone

On 2013-09-05 17:15, Daniel Stone wrote:

Many thanks for the update. We are in the process of deciding whether to stay with H2 or not. I have a few questions regarding H2 clustering. Do you know if there is any intention to enhance the H2 cluster mode? Would we be able to enhance this feature and submit back to the H2 community?

I'm not aware of anyone else currently working on the cluster stuff, and we'd be very happy to accept any contributions to this feature!

Daniel Stone

unread,

Sep 6, 2013, 5:00:03 AM9/6/13

to h2-da...@googlegroups.com, Daniel Stone

Hi Noel,

I hope you don't mind me asking but what is your role within the H2 project?

Regards,

Dan

Noel Grandin

unread,

Sep 6, 2013, 5:20:05 AM9/6/13

to h2-da...@googlegroups.com, Daniel Stone

Hi

The primary author and project leader is Thomas Mueller.

I'm a committer on this project, so I can't speak with final authority, but generally Thomas will speak up if I start spouting nonsense :-)

Regards, Noel.

--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
To post to this group, send email to h2-da...@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/groups/opt_out.

Daniel Stone

unread,

Sep 6, 2013, 7:01:36 AM9/6/13

to h2-da...@googlegroups.com, Daniel Stone

Hi,

Thanks for the quick response. We are still in discussions here as to whether we will continue to use H2 with some enhancements etc but I need to establish how complex the process would be if we do.

I notice in the FAQ that you recommend patches / bug fixes before making any major changes to the source code, which I can fully understand. How would we go about proposing design / implementation changes to the H2 clustering code? Would anyone with the authority be able to work with us for a donation to the project to speed this process up?

Regards,

Dan

Noel Grandin

unread,

Sep 6, 2013, 7:13:52 AM9/6/13

to h2-da...@googlegroups.com, Daniel Stone

On 2013-09-06 13:01, Daniel Stone wrote:

I notice in the FAQ that you recommend patches / bug fixes before making any major changes to the source code, which I can fully understand. How would we go about proposing design / implementation changes to the H2 clustering code?

This mailing list is the right place to discuss any proposed changes.
Start with a brief breakdown of your requirements i.e. what you see as broken or inadequate.
Please don't just post a proposed design. Without knowing what problems you are trying to solve, we won't be able to judge the fitness of any proposed change.

Would anyone with the authority be able to work with us for a donation to the project to speed this process up?

I don't know of anyone off-hand. Myself and Thomas both have full time jobs and family responsibilities.

Fair warning - we're not the fastest moving bunch around :-)
Sometimes it can take a couple of days before someone gets around to replying, and large patches can take a week or more before someone finds time to sit down and review it.

There is some commercial support available:
http://h2database.com/html/links.html#commercial_support

Thomas Mueller

unread,

Sep 6, 2013, 11:22:45 AM9/6/13

to H2 Google Group

Hi,

I wrote the current cluster implementation, but I understand it has many limitations. If I would do it again, I would now use a different approach, more like what MongoDB does: use replication for failover and read scalability, and use partitioning for write scalability. The current implementation is basically replication, but it is on the SQL level, which is problematic. I think replication would be better done on a lower level, for example on the file system or table level. But that means the current solution would have to be re-written almost from scratch. Unfortunately, I will not have time to do that in the near future. My priority is (besides fixing bugs of course, and support) to work in the new storage engine (MVStore).

Regards,

Thomas

Daniel Stone

unread,

Sep 6, 2013, 11:48:53 AM9/6/13

to h2-da...@googlegroups.com

Hi Neol & Thomas,

Thank you for the responses. I now have enough information to discuss this with my management. I will most likely post my requirements / problems with the current clustering solution next week, possibly with some ideas on how we believe it could be improved.

We are very pleased with the performance and general DB functionality of H2 but just need the clustering to handle transient problems more gracefully than it does today.

We did also experience many problems with a “missing lob” error which actually triggered the cluster issue in the first place. We have managed to work around this but is this still a bug with H2 (I have seen this problem mentioned on the web)?

Regards,

Dan

Ryan How

unread,

Sep 7, 2013, 12:22:41 AM9/7/13

to h2-da...@googlegroups.com

On 6/09/2013 11:48 PM, Daniel Stone wrote:

We did also experience many problems with a ï¿½missing lobï¿½ error which actually triggered the cluster issue in the first place.ï¿½ We have managed to work around this but is this still a bug with H2 (I have seen this problem mentioned on the web)?

Are you using MVCC mode by any chance?

Ryan

unread,

Sep 8, 2013, 3:02:57 AM9/8/13

to h2-da...@googlegroups.com, Daniel Stone

This sounds similar to something I noticed a while back, but maybe didn't do a very good job of explaining. Could this be similar to what you're seeing?

https://groups.google.com/forum/#!searchin/h2-database/cluster/h2-database/jtpS6QlieQo/gzPh7qInbacJ
http://pastebin.com/BizWgf3y

I think file system level replication would be a very cool feature.

Ryan

shess...@googlemail.com

unread,

Sep 8, 2013, 2:04:54 PM9/8/13

to h2-da...@googlegroups.com

I have a working HA implementation based on H2 which operates on the

file system level. It uses H2's file system abstraction and implements a

file system class that provides asynchronous replication.

If you are interested you may hava a look at https://github.com/shesse/h2ha

It's currently based on an older version of H2 (1.3.158) but - as it uses H2 unchanged -

it should be not much work to upgrade to a newer H2.

Stephan

Noel Grandin

unread,

Sep 9, 2013, 4:02:19 AM9/9/13

to h2-da...@googlegroups.com, shess...@googlemail.com

Hi Stephan

Would you be interested in including this work into the H2 core?

Regards, Noel.

shess...@googlemail.com

unread,

Sep 9, 2013, 8:04:47 AM9/9/13

to h2-da...@googlegroups.com, shess...@googlemail.com

Yes, depending on the work it generates ;-) ... I am currently quite busy with

other work.

Noel Grandin

unread,

Sep 9, 2013, 8:06:46 AM9/9/13

to h2-da...@googlegroups.com, shess...@googlemail.com

That's great.

There is no rush, I'll only have time later this week to start looking at it properly anyhow,
and since it's a new feature (for us) there will be no pressure to get things done quickly.

Daniel Stone

unread,

Sep 11, 2013, 8:30:34 AM9/11/13

to h2-da...@googlegroups.com

No, we have not enabled MVCC so I assume it is disabled.

Dan

Daniel Stone

unread,

Sep 11, 2013, 8:31:23 AM9/11/13

to h2-da...@googlegroups.com

Hi Stephan,

We will take a look at this.

Many thanks,

Dan

Daniel Stone

unread,

Sep 12, 2013, 5:13:10 AM9/12/13

to h2-da...@googlegroups.com

Hi Noel & Thomas,

We are going to perform some tests with the H2 clustering using a tool to simulate a troublesome network. Once we have established how H2 behaves exactly I will post up our current problems with H2 (unless we don't manage to break it in the way I believe we can).

I understand that you will not be able to fix/change the clustering implementation but we need to post the problems, discuss changes and then hopefully implement any approved changes.

Regards,

Dan

dch...@peaxy.net

unread,

Oct 7, 2013, 4:00:23 PM10/7/13

to h2-da...@googlegroups.com

Hi,

Noel invited me to join the discussion, and I hope nobody objects (or better yet, someone finds something useful) if I cross-post (updated with more current info) here re clustered H2. We are also trying to use it and most of what I have seen looks pretty good.

Our requirement is for a relatively small DB with a low rate of transactions. I'd like to be able to have a db clone on each of 3 nodes. Clients would be directed to one node via a virtual IP; the cluster would apply all operations at that IP and onto two copies. If either copy (not the 'master' at the virtual IP) fails the cluster continues to operate normally; after the node is recovered CreateCluster is re-run as usual, albeit with 3 nodes. If the master (the node at the virtual IP) fails our code selects a new master from either of the 2 copies. That node takes over the virtual IP and the cluster continues operating normally. We detect when the failed node returns and issue a "CreateCluster" to reincorporate that node.

As far as I can see this imposes only a single requirement on H2 that is different than usual: support for 3 nodes in a CreateCluster -serverList. I've begun exploring this a bit - I have a working 3 node cluster (nodes .55, .53, and .51). I run the server on .53 and .51, e.g.

java -cp /usr/lib/java/*:. org.h2.tools.Server -tcp -tcpPort 9101 -tcpAllowOthers -baseDir <base> &

Then on my 'first' server (.55) I successfully run CreateCluster (twice) to create db copies on node both .53 and .51:

java -cp /usr/lib/java/*:. org.h2.tools.CreateCluster \

-urlSource jdbc:h2:tcp://172.16.35.55:9101/msp \

-urlTarget jdbc:h2:tcp://172.16.35.51:9101/msp \

-serverList 172.16.35.55:9101,172.16.35.51:9101,172.16.35.53:9101

The H2 logs show an update to the CLUSTER server list that includes all three servers.

/*SQL t:1*/SET CLUSTER '172.16.35.55:9101,172.16.35.51:9101,172.16.35.53:9101';

Code in SessionRemote loops through the serverlist applying any SQL operation. I also see this interesting comment at line 397:

// TODO cluster: support more than 2 connections

What would need to be done?

I've:
- run a number of SQL operations from multiple connections and all 3 server DB copies are updated as expected.
- failed an H2 server and/or node and successfully issued operations on the remaining cluster.
- reincorporated a failed node using "CreateCluster"
- successfully issued operations against the recovered cluster.
- obtained expected behaviors when attempting to insert a duplicate key and to update or delete a missing record - basic stuff.
- obtained quite unexpected behavior by issuing 'ipconfig eth0 off' on a node, which halts the entire database until the adapter returns. Ouch.

I also attempted to create a SQL operation failure on a single node, hoping to receive an exception and see logs indicating a rollback. I found that I was able to use a URL identifying a single clustered node (more below), removed a record on only that node, and then attempted to update that record at the cluster. My takeaway so far is that I am able to break consistency and that the database does not reliably detect and rollback an operation upon a record that is missing on a node. I've not explored this very thoroughly and I'm unclear on what behavior to expect with the current H2 - especially on an unsupported 3 node configuration.

While looking at logs I noted that under some circumstances - stop and restart one H2 server, then connect and disconnect using a normal multihost URL - the value of CLUSTER is set empty:
10-02 16:15:02 jdbc[14]: /*SQL */SET AUTO_RECONNECT TRUE;

10-02 16:15:02 jdbc[14]: /*SQL */SET CLUSTER TRUE;

10-02 16:15:02 jdbc[14]: /*SQL */SET TRACE_LEVEL_SYSTEM_OUT 3;

10-02 16:15:02 jdbc[14]: /*SQL */SET DB_CLOSE_DELAY -1;

10-02 16:15:04 index: SYS_DATA remove ( /* key:119 */ 58, 0, 6, 'SET CLUSTER ''172.16.35.55:9101,172.16.35.53:9101,172.16.35.51:9101''')

10-02 16:15:04 pageStore: log - s: 1 table: 0 row: ( /* key:119 */ 58, 0, 6, 'SET CLUSTER ''172.16.35.55:9101,172.16.35.53:9101,172.16.35.51:9101''')

10-02 16:15:04 index: SYS_DATA add ( /* key:120 */ 58, 0, 6, 'SET CLUSTER ''''')

10-02 16:15:04 pageStore: log + s: 1 table: 0 row: ( /* key:120 */ 58, 0, 6, 'SET CLUSTER ''''')

Though this is a cluster, at this point you can connect and issue operations against a JDBC URL identifying single node in the cluster. The documented SELECT VALUE FROM INFORMATION_SCHEMA.SETTINGS WHERE NAME='CLUSTER'; returns an empty '' from successful connect til first write. Not a surprise, but not the expected or documented behavior. Following any write operation the expected list of servers is restored.

If I understand the H2 clustering model, its approach is to detect a failed server(/node), explicitly remove it from the cluster, and continue operating as a "healthy" cluster of one (less) node. Apart from this, operations are always expected to complete on all nodes. That's pretty blunt, but if it is "good enough" for a particular application then it is. We really like H2 and I'm guessing that there is a pretty substantial market for a lightweight solution with good enough behavior. If we can converge on what is good enough between a few of us, maybe we can make this work. Thoughts?

Daniel Stone

unread,

Nov 4, 2013, 10:03:24 AM11/4/13

to h2-da...@googlegroups.com

Hi,

Sorry for the long delay in responding. It does sound like we have similar requirements for H2 and we are certainly interested in supporting more than 2 nodes. We have managed to configure 2 nodes to perform as we expect and have also completed a number of tests running with a troublesome network (lost packets / delays etc). For the most part H2 has behaved as we'd hoped.

I’m not surprised that you have been able to break the consistency of the DBs by modifying records on a single node as this is similar to the problem that occurred for us when some queries were producing an exception. The instance that had the exception would be removed from the list for a single connection. This would then result in lost writes to the DB instance if the connection was used for writing (we use a connection pool).

Because we are very busy and have managed to work around some of the issues we have not been spending much time on this but that could change within the next few months. What kind of timescales are you working to?

Dan

shess...@googlemail.com

unread,

Jan 2, 2014, 1:50:35 PM1/2/14

to h2-da...@googlegroups.com, shess...@googlemail.com

OK, some time has passed and finally I found the time to adapt H2HA to the current

(1.3.174) file system abstraction in H2, i.e. FileChannel.

It would now be possible to attempt a code integration. However, before doing so,

some questions would need to be answered:

- most important :-): is the H2 team still interested in such a merge

- what would be the approach for merging?

- separate 'contrib' section?

- integrated in the main source tree (as separate packages)?

- requirements on documentation, testability etc.

- technical details: how to submit etc

Stephan

Thomas Mueller

unread,

Jan 2, 2014, 5:14:56 PM1/2/14

to h2-da...@googlegroups.com

Hi,

I did not yet look at the code, but I am very interested to integrate it! I would prefer to have it in the main source tree, unless it is very complex. Documentation would be needed and testing as well of course (80% code coverage is my target). We need to discuss the details.