I just started to use an H2 database and stumbled upon a problem with
using date:
Performing a query like:
SELECT * FROM HISTORIE WHERE DATUM < 2005-01-01;
throws an exception like "Cannot parse date constant 2003 [90009-56]",
eventhough I don't even have such a date in my data (the 'oldest' date
is from 2004).
Same Problem with DELETE-Statements.
Can anyone help me?
> SELECT * FROM HISTORIE WHERE DATUM < 2005-01-01;
>
> throws an exception like "Cannot parse date constant 2003 [90009-56]",
> eventhough I don't even have such a date in my data (the 'oldest' date
> is from 2004).
> Same Problem with DELETE-Statements.
I think the date 2005-01-01 is read as a number instead of a date,
2005 minus 1 minus 1 is 2003. I think it will help if you put the date
between single quotes.
What we see now are large pauses when H2 writes the data to the disk.
I'm not sure if things could be improved if the DB was written more
continuously instead of in bulk bunches.
Any thoughts are appreciated.
Thanks,
Dave
You could try to change the cache size (SET CACHE_SIZE ...). I am also
not happy with the insert (and update and delete) performance of H2.
It is faster than other databases, but not as fast as it could be.
After MVCC is done I will try to improve the performance again.
Thomas
Even if it is a hack it would help.
Thanks,
Dave
What database URL, settings, virtual machine, CPU, file system do you use?
> Also does H2 require that the new bytes in the file are 0?
Yes. Extending a file by larg blocks is much faster than extending it
using small blocks.
> have you considered using the file channel instead of the write method?
Yes.
> I'm guessing (although I could be wrong) that when the file gets big,
> mapping into memory and then writing would be faster.
It does not work in all systems. And where it works, there is a limit
in the files size (not a hard limit as far as I know; strangely,
larger files don't work as well). HSQLDB uses memory mapped files only
for smaller databases. Another problems is, un-mapping a file does not
work (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038).
> Is there any way to set the initial file size?
No
> It seems that extending
> the file dynamically is a large portion of our problem.
Why do you think so?
> The database file ends up ~200MB and the index file ~500MB.
Is this too big?
Could you profile your application to find out what the real problem is?
There are several way to profile, one is using the YourKit Java
profiler. The other is using java -Xrunhprof:cpu=samples,depth=8
Thanks,
Thomas
I have been profiling for a while now and I would be glad to share the
results. Let me formalize them more today but my initial findings was
that 1) extending the DB was a problem and 2) writing the results to
disk was really bad.
In this particular application we're using H2 to store information about
types in our system (classes in java). Imagine implementing the type
info database that Eclipse or IntelliJ uses -- that is what we are
doing.
Parsing the types without storing them, for this particular project,
takes 200 seconds. With the storage on (no indices other than the PK's)
takes ~700 seconds.
The test is much faster in the beginning v/s at the end which led me to
believe there was a memory leak somewhere and the app was GC'ing toward
the end. YourKit shows that this is the case at the very end but things
slowed down before that.
At any rate I'll be profiling more today and we're going to normalize
our schema a bit to reduce the DB/index size. I'll post back with more
of my findings. A few things worry me:
1) We're GC'ing more with the storage on and it gets incrementally
worse.
2) There is *much* more disk activity as the test continues (I'm using
iostat to monitor). The test is basically doing the same thing at the
end as it does in the beginning.
Dave
-----Original Message-----
From: h2-da...@googlegroups.com [mailto:h2-da...@googlegroups.com]
On Behalf Of Thomas Mueller
Sent: Monday, August 27, 2007 11:09 AM
To: h2-da...@googlegroups.com
Subject: Re: What is the most efficient way to add large amounts of data
to an H2 database?
The first is the beginning of the write. The database starts out empty.
The second is the last 1000 records of our write. At this point the
database is at about 100MB.
What worries me is that the writes get slower and slower.
These tests are on a Core 2, 3.0 GHz. I'm running linux with ext2 on a
raid 0, 2 disk SCSI 15K. The VM has 1.5GB of heap.
I wouldn't expect it to get this slow with this size of a DB.
What worries me is that the writes get slower and slower.
Here is a snapshot with logging turned on. I did this with logging turned off and got the same results without the logging part.
Name Time (ms)
java.io.RandomAccessFile.write(byte[], int, int)
39,482 100 %
org.h2.store.FileStore.write(byte[], int, int)
org.h2.store.DiskFile.writeBack(CacheObject)
22,009 56 %
org.h2.util.CacheLRU.removeOld()
21,957 56 %
org.h2.util.CacheLRU.removeOldIfRequired()
org.h2.util.CacheLRU.put(CacheObject)
org.h2.store.DiskFile.getRecord(int, RecordReader, int)
12,351 31 %
org.h2.store.DiskFile.addRecord(Session, Record)
9,605 24 %
org.h2.store.DiskFile.flush()
52 0 %
org.h2.store.LogFile.flush()
I also played around quite a bit trying to make this faster. My initial thought was the many, many writes H2 is doing to the DB when it removes the old entries from the cache and writes to the disk. I tried coalescing the writes by detecting when two writes were near each other and then packing those in a DataPage. The problem with this approach is there are very few blocks near each other.
At any rate I got it to work but it didn’t help.
I also printed out some statistics. The average write size to disk is ~512 bytes. When a flush happens it either flushes 6600 records or 590 records (each write 128, 256, 512, or 1152 bytes).
My initial thought was that the number of writes were the problem so that is why I tried to coalesce the writes. This still could be the case because it is rare to get more than 3 blocks next to each other (because of padding in the btree?)
It should be easy to write a simple test for this, correct?
Our schema consists of one big table (and other smaller ones) that looks like:
ID : VARCHAR,
FeatureInfoType : int,
FeatureInfoName : VARCHAR,
RelativeFeatureInfoName : VARCHAR,
IsErrant : boolean,
OwningFeatureInfoType : int,
OwningFeatureInfoName : VARCHAR,
EnclosingFeatureInfoType : int,
EnclosingFeatureInfoName : VARCHAR,
QualifyingEnclosingFeatureInfoType : int,
QualifyingEnclosingFeatureInfoName : VARCHAR,
Modifiers : int,
OffsetOfRecord : int,
LengthOfRecord : int,
LineNumber : int,
ColumnNumber : int,
DefUse : int
There are ~150,000 rows in the DB when it is done. We use a transaction around each “type” which means the transaction is for ~30 rows.
Thanks again,
Dave
From:
h2-da...@googlegroups.com [mailto:h2-da...@googlegroups.com] On Behalf Of Thomas Mueller
Sent: Wednesday, August 29, 2007
10:49 AM
To: h2-da...@googlegroups.com
Subject: Re: What is the most
efficient way to add large amounts of data to an H2 database?
Hi,
> There are ~150,000 rows in the DB when it is done. We use a
transaction
> around each "type" which means the transaction is for ~30 rows.
file:/tmp/h2
I've tried running with a large cache size and a small one.
Dave
I also checked in the debugger and we are committing every ~30 rows.
Could the bug be that commit is not modifying the records in the LRU?
In this case might it double commit the records?
Dave
-----Original Message-----
From: h2-da...@googlegroups.com [mailto:h2-da...@googlegroups.com]
On Behalf Of Dave Brewster
Sent: Saturday, September 01, 2007 12:09 PM
To: h2-da...@googlegroups.com
Thanks for the information and sorry for the delay. Improving the
performance is always important, but currently I will not invest much
time in this. It's always good to have simple standalone reproducible
test cases however, if you have something please post it!
Thanks for your help,
Thomas
It seems that the dumping of the indexes while inserting data is much
slower than other DB's.
Our "solution" was to do the DB writes in a separate thread which helps
but isn't near perfect.
In H2, the the index creation is quite slow compared to other
database: other databases use a special algorithm to build the index
for this case, while H2 uses (almost) the slow 'insert one row at a
time' algorithm. This needs to be improved of course.
Thomas