Corrupted database with serialized lock mode on a network database

138 views
Skip to first unread message

LRichard

unread,
Dec 12, 2011, 10:34:51 AM12/12/11
to h2-da...@googlegroups.com
Hello there,

   We're facing troubles (exceptions during treatment, corrupted databases, ...) when using the SERIALIZED file_lock mode on databases that reside on a network shared folder. The issue happens randomly. We excluded the concurrent access reason because we got some corrupted databases with one single user accessing the database from a single JVM process. We then used the unit tests of H2 src/test/org/h2/test/unit/TestFileLockSerialized.java. They execute nicely on our Windows desktops. But when changing TestBase.BASE_TEST_DIR in order to reference a networked directory instead of a local one, they fail on testBigDatabase with cache activated (got assertion "Expected: 500 actual: 999") or on testThreeMostlyReaders (got duplicate key exception) for instance. They fail randomly but nearly every time. I guess the serialized lock mode is sensitive to timing and the result may depend on the responsiveness of the network. We use 1 Gbs Ethernet with Novell network but our customers can have slightly different LANs. We noted that some parameters like WRITE_DELAY=0 or setting the new h2.modifyOnWrite property to true make things better but the unit tests still fails from time to time when executed on a networked database.
   We may use a real H2 server in some cases but the SERIALIZED mode is really the easier way to deploy a database for most of our customers and we would really appreciate if we could use it. We understand there's a counterpart in performance. However, our problem here is not performance (which is pretty good) but the integrity of the database. We manage to reproduce the failure with networked database but the same issues could potentially happen with a local database under similar circumstances (strong solicitation compared to file system responsiveness).
   Does anybody encounter the same problems when using database on a LAN ?
   Are there any other tests we can run on our network or any trace to retrieve in order to give more useful information ?
   Any help will be greatly appreciated.

Laurent RICHARD

LRichard

unread,
Dec 14, 2011, 6:18:03 PM12/14/11
to h2-da...@googlegroups.com
No answer. Does it mean that nobody reproduces ?
I indeed tested on a completely different configuration (MacOSX) without reproducing any error but at work, on our Novell network, the unit tests never pass.
I typically get the following output :

00:11:35 23:11:35.266 org.h2.test.unit.TestFileLockSerialized testSequence
00:11:41 23:11:41.422 org.h2.test.unit.TestFileLockSerialized testSequenceFlush
00:11:43 23:11:43.629 org.h2.test.unit.TestFileLockSerialized testLeftLogFiles
00:11:45 23:11:45.494 org.h2.test.unit.TestFileLockSerialized testWrongDatabaseInstanceOnReconnect
00:11:49 23:11:49.679 org.h2.test.unit.TestFileLockSerialized testCache()
00:11:56 23:11:56.990 org.h2.test.unit.TestFileLockSerialized testBigDatabase(false)
00:12:33 23:12:33.835 org.h2.test.unit.TestFileLockSerialized Expected: 500 actual: 999
Exception in thread "org.h2.test.unit.TestFileLockSerialized$7" java.lang.AssertionError: Expected: 500 actual: 999
    at org.h2.test.TestBase.fail(TestBase.java:428)
    at org.h2.test.TestBase.assertEquals(TestBase.java:565)
    at org.h2.test.unit.TestFileLockSerialized$7.call(TestFileLockSerialized.java:641)
    at org.h2.util.Task.run(Task.java:38)
    at java.lang.Thread.run(Thread.java:662)
00:12:35 23:12:35.183 org.h2.test.unit.TestFileLockSerialized testBigDatabase(true)
Exception in thread "main" java.lang.RuntimeException: org.h2.jdbc.JdbcSQLException: (Message 42S01 not found); SQL statement:
create table test(id int, id2 int) [42101-162]
    at org.h2.util.Task.get(Task.java:66)
    at org.h2.test.unit.TestFileLockSerialized.testBigDatabase(TestFileLockSerialized.java:658)
    at org.h2.test.unit.TestFileLockSerialized.test(TestFileLockSerialized.java:59)
    at org.h2.test.unit.TestFileLockSerialized.main(TestFileLockSerialized.java:37)
Caused by: org.h2.jdbc.JdbcSQLException: (Message 42S01 not found); SQL statement:
create table test(id int, id2 int) [42101-162]
    at org.h2.message.DbException.getJdbcSQLException(DbException.java:329)
    at org.h2.message.DbException.get(DbException.java:169)
    at org.h2.message.DbException.get(DbException.java:146)
    at org.h2.command.ddl.CreateTable.update(CreateTable.java:108)
    at org.h2.command.CommandContainer.update(CommandContainer.java:73)
    at org.h2.command.Command.executeUpdate(Command.java:226)
    at org.h2.jdbc.JdbcStatement.executeInternal(JdbcStatement.java:177)
    at org.h2.jdbc.JdbcStatement.execute(JdbcStatement.java:152)
    at org.h2.test.unit.TestFileLockSerialized$6.call(TestFileLockSerialized.java:617)
    at org.h2.util.Task.run(Task.java:38)
    at java.lang.Thread.run(Thread.java:662)

Any hint ?

LRichard

unread,
Dec 14, 2011, 6:33:28 PM12/14/11
to h2-da...@googlegroups.com
I realized that some of my first description is wrong.
  1. The WRITE_DELAY=0 doesn't seem to change anything.
  2. The testBigDatabase with cache enabled works most of the time on our networked database. It is when the cache is deactivated that the test fails as shown in previous message.
If I bypass this failing test (with cache deactivated), I typically get the following output :

00:19:15 23:19:15.794 org.h2.test.unit.TestFileLockSerialized testSequence
00:19:21 23:19:21.311 org.h2.test.unit.TestFileLockSerialized testSequenceFlush
00:19:23 23:19:23.543 org.h2.test.unit.TestFileLockSerialized testLeftLogFiles
00:19:25 23:19:25.406 org.h2.test.unit.TestFileLockSerialized testWrongDatabaseInstanceOnReconnect
00:19:29 23:19:29.609 org.h2.test.unit.TestFileLockSerialized testCache()
00:19:37 23:19:37.057 org.h2.test.unit.TestFileLockSerialized testBigDatabase(true)
00:20:12 23:20:12.180 org.h2.test.unit.TestFileLockSerialized testCheckpointInUpdateRaceCondition
00:20:16 23:20:16.820 org.h2.test.unit.TestFileLockSerialized testConcurrentUpdates
00:20:22 23:20:22.885 org.h2.test.unit.TestFileLockSerialized testThreeMostlyReaders true
00:20:33 23:20:33.589 org.h2.test.unit.TestFileLockSerialized testThreeMostlyReaders false
Exception in thread "main" org.h2.jdbc.JdbcSQLException: (Message 42S02 not found); SQL statement:
select * from test where id = ? [42102-162]

    at org.h2.message.DbException.getJdbcSQLException(DbException.java:329)
    at org.h2.message.DbException.get(DbException.java:169)
    at org.h2.message.DbException.get(DbException.java:146)
    at org.h2.command.Parser.readTableOrView(Parser.java:4758)
    at org.h2.command.Parser.readTableFilter(Parser.java:1080)
    at org.h2.command.Parser.parseSelectSimpleFromPart(Parser.java:1686)
    at org.h2.command.Parser.parseSelectSimple(Parser.java:1793)
    at org.h2.command.Parser.parseSelectSub(Parser.java:1680)
    at org.h2.command.Parser.parseSelectUnion(Parser.java:1523)
    at org.h2.command.Parser.parseSelect(Parser.java:1511)
    at org.h2.command.Parser.parsePrepared(Parser.java:405)
    at org.h2.command.Parser.parse(Parser.java:279)
    at org.h2.command.Parser.parse(Parser.java:251)
    at org.h2.command.Parser.prepareCommand(Parser.java:217)
    at org.h2.engine.Session.prepareLocal(Session.java:415)
    at org.h2.engine.Session.prepareCommand(Session.java:364)
    at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1121)
    at org.h2.jdbc.JdbcPreparedStatement.<init>(JdbcPreparedStatement.java:71)
    at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:267)
    at org.h2.test.unit.TestFileLockSerialized$2.run(TestFileLockSerialized.java:193)
    at java.lang.Thread.run(Thread.java:662)

Thomas Mueller

unread,
Dec 15, 2011, 3:46:25 AM12/15/11
to h2-da...@googlegroups.com
Hi,

> network shared folder

Maybe this is the problem. What kind of shared file system do you use?
If NFS, what options do you use (for example, did you enable metadata
caching)? I guess the problem is related to the network file system in
some way. I'm not sure if it's "just" a timing problem. To test if it
is only timing, you could try with a larger RECONNECT_CHECK_DELAY
setting - see http://h2database.com/javadoc/index.html - simply append
;RECONNECT_CHECK_DELAY=2000 to the database URL everywhere.

> (Message 42S02 not found); SQL statement: select * from test where id = ? [42102-162]

Did you build H2 yourself? If yes, how? I use Eclipse and the build
script (./build.sh) usually.

Regards,
Thomas

LRichard

unread,
Dec 15, 2011, 3:41:31 PM12/15/11
to h2-da...@googlegroups.com
Thank you for your help Thomas

I don't encounter any problems with my home network (my NAS has NFS) but the errors appear on our network at work. It is Novell Netware but I don't know much more (I should have more information soon).

I tried different values for RECONNECT_CHECK_DELAY. It changes behaviour but the random failures are still there. High values also slow the execution (but I understand there's no silver bullet).

Yes, I execute the unit tests from Eclipse (it's more practical). I let the assistant configure the project automatically. The tools need dependencies that are missing but the main and test sources seem OK. My Eclipse project must lack some properties in order to find the message texts but the engine's behavior should be OK shouldn't it ?

Please, could you give a little more information on how the SERIALIZED file_lock works. Exclusive locks FILE, SOCKET and FS are well explained but the magic of the SERIALIZED lock is more obscure.
By the way, Isn't the last part of http://www.h2database.com/html/advanced.html?highlight=serialized&search=SERIALIZED#file_locking_serialized obsolete since you fixed it with build 160 ? And shouldn't issue 311 be closed ? Maybe I'm missing something...

We're very interested in using the SERIALIZED file_lock because it allows our end-user to put the database on a NAS for instance without the need to install and manage a H2 server. The database may also be installed locally but accessed by more than one application. So, we retained the SERIALIZED mode by default and advice our users to install a server when there really is concurrency or when performance is an issue. However, several of our end-users reported corrupted database and we think it's linked to the SERIALIZED mode since we can reproduce some errors by stressing the database in this configuration and not when the database is behind a server.

That's why it matters for us and we are ready to help stabilize the SERIALIZED mode. Thanks again for your help.

Laurent

LRichard

unread,
Dec 26, 2011, 1:01:01 PM12/26/11
to h2-da...@googlegroups.com
Hi,

At work, when using a NTFS shared folder instead of a netware shared folder, the tests run fine in SERIALIZED mode.
However, we will certainly move to AUTO_SERVER mode instead of SERIALIZED file_lock because the former should be fine and we can't figure how the latter works (or fails...).
By the way, we changed the test file a little bit in order to facilitate the handling of the test database URL. I join the file so that Thomas may commit it. It's really only basic refactoring and we found it clearer that way.
Despite our troubles, I keep thinking H2 is a really great database with many useful options that could not be found anywhere else.

Regards,
Laurent
TestFileLockSerialized.java
Reply all
Reply to author
Forward
0 new messages