Problems with Date

93 views
Skip to first unread message

barashkukor

unread,
Aug 18, 2007, 7:27:34 AM8/18/07
to H2 Database
Hi!

I just started to use an H2 database and stumbled upon a problem with
using date:

Performing a query like:

SELECT * FROM HISTORIE WHERE DATUM < 2005-01-01;

throws an exception like "Cannot parse date constant 2003 [90009-56]",
eventhough I don't even have such a date in my data (the 'oldest' date
is from 2004).
Same Problem with DELETE-Statements.

Can anyone help me?

Duvel

unread,
Aug 18, 2007, 8:12:51 AM8/18/07
to H2 Database
On 18 aug, 13:27, barashkukor <cdbr...@gmail.com> wrote:

> SELECT * FROM HISTORIE WHERE DATUM < 2005-01-01;
>
> throws an exception like "Cannot parse date constant 2003 [90009-56]",
> eventhough I don't even have such a date in my data (the 'oldest' date
> is from 2004).
> Same Problem with DELETE-Statements.

I think the date 2005-01-01 is read as a number instead of a date,
2005 minus 1 minus 1 is 2003. I think it will help if you put the date
between single quotes.

barashkukor

unread,
Aug 18, 2007, 9:51:15 AM8/18/07
to H2 Database
Thanks, that fixed my problem.

Dave Brewster

unread,
Aug 22, 2007, 6:00:52 PM8/22/07
to h2-da...@googlegroups.com
We have an import procedure that does multiple inserts to an H2
database. Before this procedure we turn off all constraints, drop all
indexes, and turn off logging. Is there anything else we can do to make
things more efficient?

What we see now are large pauses when H2 writes the data to the disk.
I'm not sure if things could be improved if the DB was written more
continuously instead of in bulk bunches.

Any thoughts are appreciated.

Thanks,

Dave

Thomas Mueller

unread,
Aug 23, 2007, 3:22:47 PM8/23/07
to h2-da...@googlegroups.com
Hi,

You could try to change the cache size (SET CACHE_SIZE ...). I am also
not happy with the insert (and update and delete) performance of H2.
It is faster than other databases, but not as fast as it could be.
After MVCC is done I will try to improve the performance again.

Thomas

Dave Brewster

unread,
Aug 27, 2007, 1:31:20 AM8/27/07
to h2-da...@googlegroups.com
Is there any way to set the initial file size? It seems that extending
the file dynamically is a large portion of our problem. The database
file ends up ~200MB and the index file ~500MB.

Even if it is a hack it would help.

Thanks,

Dave

Dave Brewster

unread,
Aug 27, 2007, 1:50:32 AM8/27/07
to h2-da...@googlegroups.com
Also does H2 require that the new bytes in the file are 0? If so have
you considered using the file channel instead of the write method? I'm
guessing (although I could be wrong) that when the file gets big,
mapping into memory and then writing would be faster.

Thomas Mueller

unread,
Aug 27, 2007, 2:08:31 PM8/27/07
to h2-da...@googlegroups.com
Hi,

What database URL, settings, virtual machine, CPU, file system do you use?

> Also does H2 require that the new bytes in the file are 0?

Yes. Extending a file by larg blocks is much faster than extending it
using small blocks.

> have you considered using the file channel instead of the write method?

Yes.

> I'm guessing (although I could be wrong) that when the file gets big,
> mapping into memory and then writing would be faster.

It does not work in all systems. And where it works, there is a limit
in the files size (not a hard limit as far as I know; strangely,
larger files don't work as well). HSQLDB uses memory mapped files only
for smaller databases. Another problems is, un-mapping a file does not
work (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038).

> Is there any way to set the initial file size?

No

> It seems that extending
> the file dynamically is a large portion of our problem.

Why do you think so?

> The database file ends up ~200MB and the index file ~500MB.

Is this too big?

Could you profile your application to find out what the real problem is?
There are several way to profile, one is using the YourKit Java
profiler. The other is using java -Xrunhprof:cpu=samples,depth=8

Thanks,
Thomas

Dave Brewster

unread,
Aug 27, 2007, 2:26:28 PM8/27/07
to h2-da...@googlegroups.com
Sorry for the terse email. It was late.

I have been profiling for a while now and I would be glad to share the
results. Let me formalize them more today but my initial findings was
that 1) extending the DB was a problem and 2) writing the results to
disk was really bad.

In this particular application we're using H2 to store information about
types in our system (classes in java). Imagine implementing the type
info database that Eclipse or IntelliJ uses -- that is what we are
doing.

Parsing the types without storing them, for this particular project,
takes 200 seconds. With the storage on (no indices other than the PK's)
takes ~700 seconds.

The test is much faster in the beginning v/s at the end which led me to
believe there was a memory leak somewhere and the app was GC'ing toward
the end. YourKit shows that this is the case at the very end but things
slowed down before that.

At any rate I'll be profiling more today and we're going to normalize
our schema a bit to reduce the DB/index size. I'll post back with more
of my findings. A few things worry me:

1) We're GC'ing more with the storage on and it gets incrementally
worse.
2) There is *much* more disk activity as the test continues (I'm using
iostat to monitor). The test is basically doing the same thing at the
end as it does in the beginning.

Dave


-----Original Message-----
From: h2-da...@googlegroups.com [mailto:h2-da...@googlegroups.com]
On Behalf Of Thomas Mueller
Sent: Monday, August 27, 2007 11:09 AM
To: h2-da...@googlegroups.com
Subject: Re: What is the most efficient way to add large amounts of data
to an H2 database?

Dave Brewster

unread,
Aug 27, 2007, 3:59:55 PM8/27/07
to h2-da...@googlegroups.com
Attached are a few HTML files from YourKit. I'll do the same with
JProfiler and post those as well.

It's not the speed of the IO that worries me rather that it slows down
over time.

trace.html
CallTree.html

Dave Brewster

unread,
Aug 28, 2007, 1:02:34 AM8/28/07
to h2-da...@googlegroups.com
Attached are the files from JProfiler (much better output).

The first is the beginning of the write. The database starts out empty.
The second is the last 1000 records of our write. At this point the
database is at about 100MB.

What worries me is that the writes get slower and slower.

These tests are on a Core 2, 3.0 GHz. I'm running linux with ext2 on a
raid 0, 2 disk SCSI 15K. The VM has 1.5GB of heap.

I wouldn't expect it to get this slow with this size of a DB.

First1000.html
Last1000.html

Thomas Mueller

unread,
Aug 29, 2007, 1:48:56 PM8/29/07
to h2-da...@googlegroups.com
Hi,

Thanks for the profiling data! So the main problem is writing to the file. There is no 'back trace' (who wrote to which file); it would be interesting to get this data as well.

What worries me is that the writes get slower and slower.

Up to some point, this is normal. The beginning is fast as changes are still kept in-memory (they are written to the log file sequentially, but sequential writes are fast). After some time, the cache gets full and is written to the disk; this can not always be done sequentially (actually it is done in batches, and the batches are sorted, but still it's random access). BTree is a 'random access' data structure. Random access is slow for large files because the disk write head needs to travel quite a bit (more for larger files).

Is autocommit on or off? What database URL and settings do you use?

Thomas


Dave Brewster

unread,
Aug 29, 2007, 2:46:06 PM8/29/07
to h2-da...@googlegroups.com

Here is a snapshot with logging turned on.  I did this with logging turned off and got the same results without the logging part.

 

Name     Time (ms)

             java.io.RandomAccessFile.write(byte[], int, int)

            39,482  100 %

                         org.h2.store.FileStore.write(byte[], int, int)

                       

                                     org.h2.store.DiskFile.writeBack(CacheObject)

            22,009  56 %

                                                 org.h2.util.CacheLRU.removeOld()

            21,957  56 %

                                                             org.h2.util.CacheLRU.removeOldIfRequired()

                       

                                                                         org.h2.util.CacheLRU.put(CacheObject)

                       

                                                                                     org.h2.store.DiskFile.getRecord(int, RecordReader, int)

            12,351  31 %

                                                                                     org.h2.store.DiskFile.addRecord(Session, Record)

            9,605    24 %

                                                 org.h2.store.DiskFile.flush()

            52         0 %

                                     org.h2.store.LogFile.flush()

 

 

I also played around quite a bit trying to make this faster.  My initial thought was the many, many writes H2 is doing to the DB when it removes the old entries from the cache and writes to the disk.  I tried coalescing the writes by detecting when two writes were near each other and then packing those in a DataPage.  The problem with this approach is there are very few blocks near each other.

 

At any rate I got it to work but it didn’t help. 

 

I also printed out some statistics.  The average write size to disk is ~512 bytes.  When a flush happens it either flushes 6600 records or 590 records (each write 128, 256, 512, or 1152 bytes).

 

My initial thought was that the number of writes were the problem so that is why I tried to coalesce the writes.  This still could be the case because it is rare to get more than 3 blocks next to each other (because of padding in the btree?)

 

It should be easy to write a simple test for this, correct? 

 

Our schema consists of one big table (and other smaller ones) that looks like:

 

ID : VARCHAR,

FeatureInfoType : int,

FeatureInfoName : VARCHAR,

RelativeFeatureInfoName : VARCHAR,

IsErrant : boolean,

OwningFeatureInfoType : int,

OwningFeatureInfoName : VARCHAR,

EnclosingFeatureInfoType : int,

EnclosingFeatureInfoName : VARCHAR,

QualifyingEnclosingFeatureInfoType : int,

QualifyingEnclosingFeatureInfoName : VARCHAR,

Modifiers : int,

OffsetOfRecord : int,

LengthOfRecord : int,

LineNumber : int,

ColumnNumber : int,

DefUse : int

 

 

There are ~150,000 rows in the DB when it is done.  We use a transaction around each “type” which means the transaction is for ~30 rows.

 

Thanks again,

 

Dave

 

 

 


From: h2-da...@googlegroups.com [mailto:h2-da...@googlegroups.com] On Behalf Of Thomas Mueller
Sent: Wednesday, August 29, 2007 10:49 AM
To: h2-da...@googlegroups.com
Subject: Re: What is the most efficient way to add large amounts of data to an H2 database?

 

Hi,

Thomas Mueller

unread,
Sep 1, 2007, 4:36:51 AM9/1/07
to h2-da...@googlegroups.com
Is autocommit on or off? What database URL and settings do you use?

Dave Brewster

unread,
Sep 1, 2007, 3:08:39 PM9/1/07
to h2-da...@googlegroups.com
Off. We commit every ~30 rows.

> There are ~150,000 rows in the DB when it is done. We use a
transaction
> around each "type" which means the transaction is for ~30 rows.

file:/tmp/h2

I've tried running with a large cache size and a small one.

Dave

Dave Brewster

unread,
Sep 4, 2007, 6:34:43 PM9/4/07
to h2-da...@googlegroups.com
I have more info on this. If I increase the cache size to some very
large number (like 1 million) things work really, really well.

I also checked in the debugger and we are committing every ~30 rows.
Could the bug be that commit is not modifying the records in the LRU?
In this case might it double commit the records?

Dave

-----Original Message-----
From: h2-da...@googlegroups.com [mailto:h2-da...@googlegroups.com]

On Behalf Of Dave Brewster
Sent: Saturday, September 01, 2007 12:09 PM
To: h2-da...@googlegroups.com

Thomas Mueller

unread,
Sep 20, 2007, 2:37:16 PM9/20/07
to h2-da...@googlegroups.com
Hi,

Thanks for the information and sorry for the delay. Improving the
performance is always important, but currently I will not invest much
time in this. It's always good to have simple standalone reproducible
test cases however, if you have something please post it!

Thanks for your help,
Thomas

Dave Brewster

unread,
Sep 20, 2007, 2:51:05 PM9/20/07
to h2-da...@googlegroups.com
I removed all indexes (PK's were still there) and it is better. However
-- adding back the PK indexes takes way longer than just leaving them in
:-<

It seems that the dumping of the indexes while inserting data is much
slower than other DB's.

Our "solution" was to do the DB writes in a separate thread which helps
but isn't near perfect.

Thomas Mueller

unread,
Sep 24, 2007, 1:50:07 PM9/24/07
to h2-da...@googlegroups.com
Hi,

In H2, the the index creation is quite slow compared to other
database: other databases use a special algorithm to build the index
for this case, while H2 uses (almost) the slow 'insert one row at a
time' algorithm. This needs to be improved of course.

Thomas

Reply all
Reply to author
Forward
0 new messages