From my experience I would use MULTI_THREADED=TRUE if those updates you
have are long running statements. ie. A single update may run for
several seconds. Using multi-threaded will mean it doesn't block other
threads accessing the database at the same time. Use MVCC if you are
having issues with tables being locked by long running transactions (I'm
not a full bottle on MVCC sorry). I think you may not need either with
your use case, but it depends on how those updates work. I'd see how it
goes and only use them if you need them. If you run them as lots of
small transactions then you shouldn't have issues. Stability wise both
modes are considered experimental. I've been running multi-threaded for
a while and have had a few bugs, which appear as if they were just fixed
recently.
In my project I run the database embedded as part of the web app and
start up a tcp server to allow access from other processes. It means it
gives good performance from the web app, but it is a pain if the web app
is down for maintenance and I want to access the database. On the plus
it means the database is bundled with the web app, so I don't need to
worry about another process or server running and it is easier for
deployment. Stability wise both modes are good as far as I know. So it
really depends on what works better for you.
Hope this helps, Ryan
If they will help performance will depend a lot on the finer details of
what you are doing. Multi-threaded will only help if you have long
running statements. I'd probably try it first and see how it goes and
then only try it if you are having issues.
You cannot use mvcc and multi threaded at the same time.
Yep, both can be used in embedded or server mode.
I haven't tried any of those other databases. I think H2 is great, and
if you hit any issues then you get really good support and bugs are
fixed quickly. I've been using it for a year in production and it works
really well. So I don't think it has stability issues. And I remember
reading a post about thomas using a christmas light timer to simulate
power failures on a computer running automated tests to try and corrupt
a database. So a lot of effort goes into making sure it is stable. I
would call experimental in this project a lot more stable than stable in
some other projects.
Ryan
Basically, it's like this. By default, H2 allows only one
statement/query to run on the database at a time (MULTI_THREADED=false).
That means, if you have multiple threads using the DB, only one thread's
statement will be executing at any given point.
To get around this, you can use MULTI_THREADED=true, but this is still
experimental.
One problem with MULTI_THREADED=false is locking of tables when you have
concurrent transactions (transactions, not statements, as statements are
not concurrent).
eg. (all statements happen sequentially, not concurrently)
1. Thread 1: start transaction
2. Thread 2: start transaction
3. Thread 1: do something that locks the table
4. Thread 2: do something on that locked table
5. Thread 1: commit (and unlock table)
This does not work, because at line 4, Thread 2 is attempting to do
something that would require it to wait for Thread 1 to finish first.
Now, because the statements are sequentially executed (not
concurrently), there is no way for Thread 1 to finish while Thread 2 's
statement at line 4 is running, so Thread 2 would wait forever. H2 knows
about this, and detects it, and throws a SQLException to that effect at
line 4. Thread 2's transaction fails, but Thread 1's transaction is fine.
MVCC solves that problem (well, most of the time). MVCC allows the
database to store different versions of the rows in the tables. The rows
you see are a snapshot based on when you started your transaction, and
whether or not the transactions of later versions are committed etc.
(I'm not sure of the exact rules for H2, but PostgreSQL has some nice
docs on their implementation rules, I imagine H2 is similar).
So, with MVCC enabled, their isn't a need for table locks, lock happen
only on the rows affected where possible. There are some circumstances
where table locks (or locks that would block other statements) are
needed, but mostly only on the rows that change.
In the example above, at line 3, Thread 1 updates some rows. This causes
new versions of these rows to be created. If Thread 1 were to get a list
of rows, it would see these new rows.
At line 4, Thread 2 tries to get a list of rows. Now, because Thread 1's
transaction is as yet uncommitted, Thread 2 will not see the new
versions of the rows modified by Thread 1. It will see the older version
of the rows as they were before Thread 1's transaction. No lock
contention, Thread 2 can continue with these rows, and everything is OK.
Better concurrency.
However, there could still be deadlock if Thread 1's operation would
lock something in a way that would block Thread 2, but this happens
seldom with MVCC.
One example of the deadlock that would remain would be if Thread 1 were
to insert a row with a certain value in a column with a unique
constraint, then Thread 2 were to insert a different row with the same
value in that column. Neither transaction is triggering a unique
constraint violation, because there is no such value committed to the
table. However, both transaction cannot succeed, and it is undesirable
for the database to throw an exception at commit time about a statement
that happened a while ago - the statement should fail when and where it
is executed. For this to work correctly, the database should cause
thread 2 to wait for the completion of thread 1's transaction to find
out whether or not the value in contention is committed or not, but with
non-concurrent statements this could never happen. Actually, what I've
just described here work in PostgreSQL, but doesn't work in H2. H2
1.2.139 throws a unique constraint violation when the second thread
attempts to insert a duplicate value. This is probably a bug, because
it's failing because data which is not committed yet.
Ultimately, multi-threaded MVCC is the best solution, but H2 hasn't got
there yet.
As far as other DB's go, HSQLDB (2.x.x) has nice a nice feature set, but
the query optimizer is either broken, or its HORRIBLY slow compared to
H2 on slightly complex queries. I had one where H2 did the query in
about 1 second, I killed the app after HSQLDB had run for over 5 minutes
and still was not complete. Also, H2 data types are nicer to work with.
MySQL doesn't have all the Java advantages, and its referential
integrity enforcement (in my experience) leaves something to be desired.
Also, on big tables, your indexes are permanently getting corrupt. I
cant comment on Derby, but I must say I've always been impressed with
PostgreSQL when I've used it. I've only once had DB corruption with H2
1.2.125 or thereabouts, and am very happy with it over all.
Cheers,
Jesse
Yes, I use SQL with JDBC, but then again, so does Hibernate. Please dont
get me wrong. You CAN do multithreading quite nicely with Hibernate and
H2, using MVCC. There are only seldom issues with regards to contention.
I use H2 with MVCC in a multithreaded environment, and I'm very happy
with it. (I'd be more happy when MULTI_THREADED and MVCC can be combined
though).
MVCC is an in-database transaction control/isolation mechanism. I dont
think Hibernate has anything like MVCC (except the JBoss cache for
hibernate, where the cache implements MVCC too, but that's a different
story). Its really something you configure on the DB, and it just works
for Hibernate. Hibernate should continue as usual without even knowing
MVCC is there.
Thanks,
Jesse
On 22/07/2010 13:22, Peter wrote:
> ha, Jesse, I got the exception and immediately you got the explanation
> for this. Thanks!
>
> I will double read your message now, but a quick look over it leaves
> me to the conclusion:
> with hibernate only (and h2) I cannot do multithreading easily? You
> are using pure SQL, right?
>
> And MVCC is the same solution (Optimistic locking) as a separate
> column 'version' is in hibernate? [1]
>
> Peter.
>
> [1]
> http://stackoverflow.com/questions/993504/hibernate-and-optimistic-locking
>
> On Jul 22, 1:13 pm, Jesse Long<j...@unknown.za.net> wrote:
>> Hi Peter,
>>
>> Basically, it's like this. By default, H2 allows only one
>> statement/query to run on the database at a time (MULTI_THREADED=false)..
Yep. Works great.
> some background processes updating the database
What kind of processes are those? There are four kinds:
1) quick write statements and short transactions (best)
2) quick write statements, but long running transactions
3) some slow write statements, but short transactions (autocommit)
4) some slow write statements, combined and long transactions
With "quick write statements" I mean update/insert/delete/merge
statements that are each relatively fast (a few milliseconds) and
therefore don't block other processes. With "slow write statements" I
mean if some of the update/insert/delete/merge statements take
multiple seconds (and you don't want the user to wait that long). With
"short transactions" I mean autocommit enabled (which is the default),
or transactions that don't run for more than a few quick write
statements. With "long running transactions" I mean transactions that
run for many seconds, possibly minutes.
For 1) you can use the default mode. For 2) MVCC is best. For 3)
MULTI_THREADED is probably best, and for 4) I'm afraid there is no
good solution right now, because MVCC can't be combined with
MULTI_THREADED right now.
> Could both options be used in embedded mode as well?
Yes. The server mode is basically the same as the embedded mode,
except statements and results are sent over the network. You can also
mix embedded and server mode.
As far as stability is concerned: for H2, the default mode is the best
tested mode, so it's the most stable. The MVCC mode is well tested
now, I will remove the "experimental" mark, but it is not as well
tested as the default mode. The MVCC mode may not always work exactly
like in other databases however. The MULTI_THREADED mode is tricky to
test (like all multi-threaded things in Java), and bugs have been
found and fixed recently. Before I want to mark the multi-threaded
mode as stable I want to first write a lot of automated, randomized
test cases that exercise most (if not all) of the features of H2 in
multi-threaded mode.
> another db like hsqldb, derby or mysql for this multithreaded scenario
For the multi-threaded scenario, the "big" databases (MySQL,
PostgreSQL, Oracle, MS SQL Server, DB2 and so on) are the most stable.
Derby is also stable in this area, because it always has been enabled
(as far as I know), so it's "time tested". But Derby doesn't support
MVCC, and normal operations might be slower. Maybe for your use case
Derby is still the better choice, I'm not sure (it really depends on
your use case). I believe that for HSQLDB, the multi-threaded feature
is not as well tested (similar to H2). That's just my view of course.
Regards,
Thomas