Slow insert on index vs hsqldb

173 views
Skip to first unread message

Stig Christensen

unread,
May 21, 2015, 1:36:04 PM5/21/15
to h2-da...@googlegroups.com
I'm seeing slow inserts due to an indexed column populated with random data. 
This is already explained here https://groups.google.com/forum/?hl=da#!topic/h2-database/bGeXZXuEKDM

Now I have compared it with hsqldb which doesn't seem to have the same performance issues.
Maybe hsqldb is creating spacing in the index so data not have to be moved around? 
Could this explain the differences and is possible to tweak H2 for better performance?

h2-1.3.170
Testing jdbc:h2:C:\h2db
100.000 in 1482 (milliseconds)
100.000 in 2341
100.000 in 2639
100.000 in 3355
100.000 in 5726
100.000 in 3449
100.000 in 8006
100.000 in 13029
100.000 in 12902
100.000 in 12029
Selecting 100000
Select done in 4200
Close in 2388


hsqldb-2.3.2
Testing jdbc:hsqldb:file:c:\hsqldbtestdb
100.000 in 3838
100.000 in 6194
100.000 in 2922
100.000 in 3028
100.000 in 3401
100.000 in 3324
100.000 in 8254
100.000 in 3713
100.000 in 3839
100.000 in 3807
Selecting 100000
Select done in 3293
Close in 0

HsqldbTest.java

Noel Grandin

unread,
May 22, 2015, 7:55:29 AM5/22/15
to h2-da...@googlegroups.com
I don't think hsqldb supports on-disk indexes, it's indexes are always stored only in RAM, so it has less disk IO to
perform. (but I could be wrong, corrections welcome).

At the moment, H2 does not support memory-only indexes on disk-based data (but patches are welcome)

Thomas Mueller

unread,
May 22, 2015, 12:28:28 PM5/22/15
to H2 Google Group
Hi,

I think in this case, HSQLDB keeps the index on disk.

I ran the test on my machine and it looks like there is a problem in H2 here when using the PageStore (which is the default for the version of H2 you are using). With a recent version of H2 (with the MVStore), it is much better, even thought initially inserts are slower. This I will still need to work on.

As for indexing randomly distributed data, a fast way is to do it chunk by chunk, where a chunk fits in memory. Basically, sort the chunk, store it, then process the next chunk, and so on. Once you are done, sort the chunks using merge sort. This is what I did in the ArchiveTool: https://github.com/h2database/h2database/blob/master/h2/src/tools/org/h2/dev/fs/ArchiveTool.java - there is also a version that use the MVStore: https://github.com/h2database/h2database/blob/master/h2/src/tools/org/h2/dev/fs/ArchiveToolStore.java

With revent versions of H2, with the MVStore, this algorithm is used if you create an index after inserting the data. So if you move "create index" after inserting, then inserting is much faster. Creating the index is also fast.

Regards,
Thomas





On Fri, May 22, 2015 at 1:55 PM, Noel Grandin <noelg...@gmail.com> wrote:
I don't think hsqldb supports on-disk indexes, it's indexes are always stored only in RAM, so it has less disk IO to perform. (but I could be wrong, corrections welcome).

At the moment, H2 does not support memory-only indexes on disk-based data (but patches are welcome)


--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
To post to this group, send email to h2-da...@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Stig Christensen

unread,
May 24, 2015, 3:16:40 AM5/24/15
to h2-da...@googlegroups.com
Hi Thomas, i also tried the newest version with PageStore and then performance dropped even more.

Testing jdbc:h2:C:\h2-bdb;MV_STORE=FALSE;MVCC=FALSE
100.000 in 1544
100.000 in 1717
100.000 in 3668
100.000 in 9969
100.000 in 19164
100.000 in 24247
100.000 in 26944
100.000 in 27150
100.000 in 27513
100.000 in 28763
Selecting 100000
Select done in 5055
Close in 532

Unfortunately I cannot use the tricks described. I need to store the HASH one at a time and have fast read access from the beginning. I'm developing a single threaded client app that should use as little disk space and CPU usage possible.

Why is it that MVStore ends up using 10 timers more disk space in this test?



Thomas Mueller

unread,
May 26, 2015, 1:43:45 AM5/26/15
to h2-da...@googlegroups.com
Hi,

Why is it that MVStore ends up using 10 timers more disk space in this test?

By default, the MVStore keeps the data of the last 45 seconds. After that, the disk space is re-used. The database file size will usually shrink when you close the database.

Regards,
Thomas

Reply all
Reply to author
Forward
0 new messages