Slow insert on index vs hsqldb

Stig Christensen

unread,

May 21, 2015, 1:36:04 PM5/21/15

to h2-da...@googlegroups.com

I'm seeing slow inserts due to an indexed column populated with random data.

This is already explained here https://groups.google.com/forum/?hl=da#!topic/h2-database/bGeXZXuEKDM

Now I have compared it with hsqldb which doesn't seem to have the same performance issues.

Maybe hsqldb is creating spacing in the index so data not have to be moved around?

Could this explain the differences and is possible to tweak H2 for better performance?

h2-1.3.170

Testing jdbc:h2:C:\h2db

100.000 in 1482 (milliseconds)

100.000 in 2341

100.000 in 2639

100.000 in 3355

100.000 in 5726

100.000 in 3449

100.000 in 8006

100.000 in 13029

100.000 in 12902

100.000 in 12029

Selecting 100000

Select done in 4200

Close in 2388

hsqldb-2.3.2

Testing jdbc:hsqldb:file:c:\hsqldbtestdb

100.000 in 3838

100.000 in 6194

100.000 in 2922

100.000 in 3028

100.000 in 3401

100.000 in 3324

100.000 in 8254

100.000 in 3713

100.000 in 3839

100.000 in 3807

Selecting 100000

Select done in 3293

Close in 0

HsqldbTest.java

Noel Grandin

unread,

May 22, 2015, 7:55:29 AM5/22/15

to h2-da...@googlegroups.com

I don't think hsqldb supports on-disk indexes, it's indexes are always stored only in RAM, so it has less disk IO to
perform. (but I could be wrong, corrections welcome).

At the moment, H2 does not support memory-only indexes on disk-based data (but patches are welcome)

Thomas Mueller

unread,

May 22, 2015, 12:28:28 PM5/22/15

to H2 Google Group

Hi,

I think in this case, HSQLDB keeps the index on disk.

I ran the test on my machine and it looks like there is a problem in H2 here when using the PageStore (which is the default for the version of H2 you are using). With a recent version of H2 (with the MVStore), it is much better, even thought initially inserts are slower. This I will still need to work on.

As for indexing randomly distributed data, a fast way is to do it chunk by chunk, where a chunk fits in memory. Basically, sort the chunk, store it, then process the next chunk, and so on. Once you are done, sort the chunks using merge sort. This is what I did in the ArchiveTool: https://github.com/h2database/h2database/blob/master/h2/src/tools/org/h2/dev/fs/ArchiveTool.java - there is also a version that use the MVStore: https://github.com/h2database/h2database/blob/master/h2/src/tools/org/h2/dev/fs/ArchiveToolStore.java

With revent versions of H2, with the MVStore, this algorithm is used if you create an index after inserting the data. So if you move "create index" after inserting, then inserting is much faster. Creating the index is also fast.

Regards,

Thomas

On Fri, May 22, 2015 at 1:55 PM, Noel Grandin <noelg...@gmail.com> wrote:

I don't think hsqldb supports on-disk indexes, it's indexes are always stored only in RAM, so it has less disk IO to perform. (but I could be wrong, corrections welcome).

At the moment, H2 does not support memory-only indexes on disk-based data (but patches are welcome)

--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
To post to this group, send email to h2-da...@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Stig Christensen

unread,

May 24, 2015, 3:16:40 AM5/24/15

to h2-da...@googlegroups.com

Hi Thomas, i also tried the newest version with PageStore and then performance dropped even more.

Testing jdbc:h2:C:\h2-bdb;MV_STORE=FALSE;MVCC=FALSE

100.000 in 1544

100.000 in 1717

100.000 in 3668

100.000 in 9969

100.000 in 19164

100.000 in 24247

100.000 in 26944

100.000 in 27150

100.000 in 27513

100.000 in 28763

Selecting 100000

Select done in 5055

Close in 532

Unfortunately I cannot use the tricks described. I need to store the HASH one at a time and have fast read access from the beginning. I'm developing a single threaded client app that should use as little disk space and CPU usage possible.

Why is it that MVStore ends up using 10 timers more disk space in this test?

Thomas Mueller

unread,

May 26, 2015, 1:43:45 AM5/26/15

to h2-da...@googlegroups.com

Hi,

Why is it that MVStore ends up using 10 timers more disk space in this test?

By default, the MVStore keeps the data of the last 45 seconds. After that, the disk space is re-used. The database file size will usually shrink when you close the database.