I'm coming over from the Open Blue Dragon Group which is an Open
Source CFML engine that has H2 baked in. This means that all of us web
developers are getting to play with H2 when we build dynamic data
driven websites.
We've been having some discussion on using the embedded H2 Database
with Open Blue Dragon on cloud services to create high performance web
services and sites. I'd like to get this group's opinion on what the
optimum settings would be for this scenario. To use my own use case as
an example, I am building a web service that performs selects only but
each request to the service generates 15-20 database queries before
returning the result. I've set the H2 database to read only so that it
resides in memory. Maximum connections is set at 24 and connection
timeout at 120.
What other settings should I consider beyond these default settings to
get the most performance out of H2? How can I maximize it's
performance for the most concurrent requests? Should I play with the
cache size settings or leave it at the default? For read only use
would it be safe to use SET LOCK_MODE = 0 in a production environment?
The nice thing about using the H2 / Open Blue Dragon combination is
that it's a very simple and compact package and new instances can be
quickly spun. For example, it we get Dugg we could, in theory, spin up
100s of instances to handle the traffic and then just as easily kill
them once the spike subsides.
Anyway, would love to hear what the community thinks and what
experience / lessons you can share.
Thanks!
GT
P.S. If you don't mind I'll be posting back to the OBD Group some of
your answers since there is a lot of interest regarding the topic
among OBD developers.
If you do a quick search of this group, there is also some various
performance advice (much of which is still pertinent even with page
store). I know I've written a number of posts dealing with
performance issues & optimization.
Additional advice:
-- ALWAYS increase cache settings if you can. This is one of the
biggest performance boosts you can give (besides good indices and in-
memory tables/indices).
-- Embedded mode is *much, much* faster than server mode for H2. If
you're using it in the cloud, try to use embedded mode whenever
possible!
-- Closing a DB can be quite slow, as can opening a DB that was
closed improperly (depending on settings it may have to rebuild
indices!)
-- For tables, speed and memory use increase in this order: Cached
< Memory < Not Persistent
-- Allocate much more memory than you expect to the JVM when using
Memory/Not Persistent Tables and DBs or you can get an
OutOfMemoryError
-- Note that with "Not Persistent" tables, you lose the durability
against power failure. They are extremely fast though.
-- Java triggers and stored procedures are *very* fast. In some
cases, they may be the fastest way to accomplish a complex operation.
-- LZF compression can improve LOB read performance tremendously by
reducing I/O, and I think will save memory for memory tables/DBs
-- Compression is usually slower if you're on an SSD, unless your
CPU is really, really fast
-- Choose data types appropriately: Decimal/Numeric types are slower
to use and take more storage than the other numeric types, and are
really only there for storing currency amounts.
-- Hunt down my other post in the groups on performance and data
types -- this never made it into the documentation, but it is still
important.
-- Using the Full Text Search (http://www.h2database.com/html/
tutorial.html#fulltext) can improve performance by an order of
magnitude over LIKE '%stuff%'
While H2 is stable, some features which increase performance at the
cost of stability / data integrity, or are experimental or still not
fully tested.
You can see a list of unsafe settings here: http://www.h2database.com/html/faq.html#reliable
Cheers,
Sam Van Oort
My response is similar to what Sam already wrote.
> high performance web services and sites. I'd like to get this group's opinion on what the optimum settings
I don't know what the exact requirements are, but usually the default
settings are fine.
The most simple way to speed up H2 is to use a larger cache size:
http://www.h2database.com/html/grammar.html#set_cache_size
In many cases, it's more important to speed up queries than trying to
tweak database settings. Please read the optimization guide at:
http://www.h2database.com/html/performance.html#database_performance_tuning
- this will help you ensure indexes are used and such. Also, see
http://www.h2database.com/html/performance.html#database_profiling
> selects only
I depends on the database size, and on how much memory you have. A
read-only database is fine, maybe you could use in-memory indexes
(they need more memory):
http://www.h2database.com/html/grammar.html#create_table - CREATE
MEMORY TABLE "Cached tables (the default) are persistent, and the
number of rows is not limited by the main memory. Memory tables are
persistent, but the index data is kept in main memory, that means
memory tables should not get too large."
> I've set the H2 database to read only so that it resides in memory.
Read-only databases don't reside in memory. They are just read-only.
If you want that everything is in memory, you could use an in-memory
database - see http://www.h2database.com/html/features.html#in_memory_databases
- but that would even need a lot more memory.
> Maximum connections is set at 24 and connection timeout at 120.
So, you are using a connection pool? If yes, which one?
> maximize it's performance for the most concurrent requests?
> For read only use would it be safe to use SET LOCK_MODE = 0 in a production environment?
Yes, that is safe. I'm not sure if it will help in your case however.
If the database is read-only, then I think it's safe to use the
multi-threaded kernel mode:
http://www.h2database.com/html/grammar.html#set_multi_threaded
> I'll be posting back to the OBD Group
No problem. As far as I see, the thread is:
http://groups.google.com/group/openbd/browse_thread/thread/bddb62c2cebdcf98
(for reference).
Regards,
Thomas
Excellent ideas and suggestions! Thank you very much. I will post a
summary to the OpenBlueDragon group. OBD automatically sets up some of
the these settings (such as the connection pool, and for embedded
databases it activates the auto_server setting). This makes it easy
for a dev who doesn't want to mess with db setting but it also means
it hides some of these settings unless you specifically look for them.
What I'd like to do back on the OBD group is post some "recommended"
settings and connection strings for the most common uses of H2 -- as
well as some tips for optimization. It would make a good starting
point and then everyone can adjust to suite their own needs and post
their own settings based on their use cases. Thanks again for your
input. H2 has been really great to work with -- especially coming for
the MSSQL world!
GT