Slow inserts after table has approx 1 billion records

1,129 views
Skip to first unread message

Harish Mahadevan

unread,
Mar 25, 2012, 1:26:53 AM3/25/12
to H2 Database
Hi All,

I am linking a table of 5 million records from one h2 database to
another h2 database and am trying to copy the 5 million records 300
times. The first 175 times the copy occurs at a constant time interval
of approx 25 times every 40 minutes, but after the new table has
approximately a billion records, the inserts seems to be slowing down,
it almost takes about 24 hours to copy the same 5 million another 25
times. I am using a INSERT INTO with a SELECT sql, thats the reason
behind the linking of the table. There is literally no other activity
on the machine i am using to run this copy. I have dedicated around 8
gigs for this process. Any idea on why this behavior with h2
databases?

regards
Harish

Noel Grandin

unread,
Mar 26, 2012, 4:06:34 AM3/26/12
to h2-da...@googlegroups.com, Harish Mahadevan
- Are you calling commit() after each batch?
- What does the memory usage look like when it slows down?
- If the memory usage is running high, trying using jdump to get a heap
dump, and then use Eclipse MAT to analyze the dump
- If the memory usage is fine, try hooking up a profiler likes Netbeans
Profiler and see what is taking the time

Noel Grandin

unread,
Mar 26, 2012, 10:55:58 AM3/26/12
to Harish Mahadevan, h2-da...@googlegroups.com
Hmmm, nothing obviously wrong in that stacktrace, since that is pretty much what I expect when it's doing an insert.

You could try increasing the page size of your database
http://www.h2database.com/html/features.html#page_size
since it seems like it's a pretty big database

You could also try playing with cache settings, see here:
http://www.h2database.com/html/features.html#cache_settings

Something is definitely going a little weird, but I don't know the internals well enough to know exactly where.

On 2012-03-26 16:41, Harish Mahadevan wrote:
Hi Noel,

- Are you calling commit() after each batch?
Yeah i commit after every 100 million. The inserts take the same amount of time until the table has roughly around 800 million.

- What does the memory usage look like when it slows down?
The job uses 1.4% of the 8gb. So its definitely not running out of memory

- If the memory usage is fine, try hooking up a profiler likes Netbeans Profiler and see what is taking the time
I tried stack tracing to figure out which step it is on, and found the job to be doing the below process for a long time.

"main" prio=10 tid=0x000000000126a000 nid=0x4cdc runnable [0x00007f664828b000]
   java.lang.Thread.State: RUNNABLE
        at org.h2.util.CacheLRU.removeOld(CacheLRU.java:184)
        at org.h2.util.CacheLRU.removeOldIfRequired(CacheLRU.java:136)
        at org.h2.util.CacheLRU.put(CacheLRU.java:112)
        at org.h2.store.PageStore.getPage(PageStore.java:795)
        - locked <0x0000000580009338> (a org.h2.store.PageStore)
        at org.h2.index.PageDataIndex.getPage(PageDataIndex.java:222)
        at org.h2.index.PageDataNode.addRowTry(PageDataNode.java:128)
        at org.h2.index.PageDataIndex.addTry(PageDataIndex.java:163)
        at org.h2.index.PageDataIndex.add(PageDataIndex.java:128)
        at org.h2.table.RegularTable.addRow(RegularTable.java:127)
        at org.h2.command.dml.Insert.addRow(Insert.java:162)
        at org.h2.command.dml.Select.queryFlat(Select.java:523)
        at org.h2.command.dml.Select.queryWithoutCache(Select.java:617)
        at org.h2.command.dml.Query.query(Query.java:290)
        at org.h2.command.dml.Insert.insertRows(Insert.java:132)
        at org.h2.command.dml.Insert.update(Insert.java:84)
        at org.h2.command.CommandContainer.update(CommandContainer.java:71)
        at org.h2.command.Command.executeUpdate(Command.java:212)
        - locked <0x00000005800085f8> (a org.h2.engine.Database)
        at org.h2.jdbc.JdbcStatement.executeUpdateInternal(JdbcStatement.java:125)
        - locked <0x0000000580064d48> (a org.h2.engine.Session)
        at org.h2.jdbc.JdbcStatement.executeUpdate(JdbcStatement.java:110)
 
Thank you.

regards
Harish
--
Harish Mahadevan
Software Product Developer

Knome Inc.
25 First Street, Suite 107
Cambridge, MA  02141
USA

hmaha...@knome.com
Phone: 617.715.1000

The Human Genome Interpretation Company

This e-mail is covered by the Electronic Communications Privacy Act, 18 U.S.C. §§ 2510-2521 and is legally privileged.  This email contains confidential information intended only for the person(s) to whom this e-mail message is addressed.  If you have received this e-mail message in error, please notify the sender immediately by telephone at (617) 715-1000 or by electronic mail (hmaha...@knome.comand destroy the original message without making a copy. Thank you.

Harish Mahadevan

unread,
Mar 26, 2012, 10:41:30 AM3/26/12
to Noel Grandin, h2-da...@googlegroups.com
Hi Noel,

- Are you calling commit() after each batch?
Yeah i commit after every 100 million. The inserts take the same amount of time until the table has roughly around 800 million.
- What does the memory usage look like when it slows down?
The job uses 1.4% of the 8gb. So its definitely not running out of memory

- If the memory usage is fine, try hooking up a profiler likes Netbeans Profiler and see what is taking the time

Thomas Mueller

unread,
Mar 26, 2012, 1:05:15 PM3/26/12
to h2-da...@googlegroups.com
Hi,

What version of H2 do you use? A bug related to large databases was
fixed recently (see the change log, I think version 1.3.161). But I'm
not sure this will fix the problem.

> Yeah i commit after every 100 million.

Normally I would commit once every 1000 rows or so. 100 million sounds
like way too much.

> The job uses 1.4% of the 8gb.

You should probably increase the cache size in H2 (cache_size).

What kind of indexes to you use? For example, do you use an index on
randomly distributed data? That would be a problem because it would it
basically "kills the cache".

Is there a file called <databaseName>.trace.db next to the database
file? If yes what does it contain?

Regards,
Thomas

Harish Mahadevan

unread,
Mar 26, 2012, 2:20:41 PM3/26/12
to h2-da...@googlegroups.com
 - What version of H2 do you use? A bug related to large databases was fixed recently (see the change log, I think version 1.3.161). But I'm not sure this will fix the problem.
I am using the 1.3.155 version , as you pointed out i upgraded the version to 1.3.165 right now.

- Normally I would commit once every 1000 rows or so. 100 million sounds like way too much.
I usually would do the same but as this insert are coming straight from the selected table(5 million rows), i thought am anyways increasing the memory on the job, so as well do a commit latter, as its a little faster without the shorter commits. But i have reduced it to 5 million as that would be the minimum from the table.

-You should probably increase the cache size in H2 (cache_size).
I have increased the cache_size as per Noel's advice too.

- What kind of indexes to you use? For example, do you use an index on
randomly distributed data? That would be a problem because it would it
basically "kills the cache".
I have no indexes nor primary keys.

- Is there a file called <databaseName>.trace.db next to the database

file? If yes what does it contain?
No there isn't any trace.db file as i dont have any sql errors or exceptions yet :-).

Thanks a lot guys will keep you updated on how this job progresses.

regards
Harish
 


--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.

Thomas Mueller

unread,
Mar 26, 2012, 3:24:14 PM3/26/12
to h2-da...@googlegroups.com
Hi,

> - Normally I would commit once every 1000 rows or so. 100 million
> sounds like way too much.

I forgot about: if you use CREATE TABLE(...) AS SELECT ... then a
commit every x number of rows isn't necessary. The reason is that in
this case, it's not a 'real' transaction (with all the overhead
usually required). But that's really just CREATE TABLE AS SELECT. It
should be about twice as fast as separate CREATE TABLE and then INSERT
INTO ... SELECT.

Regards,
Thomas

Harish Mahadevan

unread,
Mar 26, 2012, 6:23:22 PM3/26/12
to h2-da...@googlegroups.com
- I forgot about: if you use CREATE TABLE(...) AS SELECT ... then a

commit every x number of rows isn't necessary. The reason is that in
this case, it's not a 'real' transaction (with all the overhead
usually required). But that's really just CREATE TABLE AS SELECT. It
should be about twice as fast as separate CREATE TABLE and then INSERT
INTO ... SELECT.

Yeah but i guess in my case even if i have a create statement i would end up inserting the data a multiple times as I am copying the data from one table to another multiple times. So i think i would need a insert. 

Thank you

regards
Harish


Regards,
Thomas

--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.

Harish Mahadevan

unread,
Mar 27, 2012, 10:42:08 AM3/27/12
to h2-da...@googlegroups.com
Hi Guys,

Increasing the cache_size or even setting the cache_size to 0 works and limiting commits helps too i guess :-).

Thanks a lot guys. Really appreciate the help.

regards
Harish

Harish Mahadevan

unread,
Apr 3, 2012, 4:31:40 PM4/3/12
to h2-da...@googlegroups.com
Hi Guys,

A recent update on this issue is, i got the job to do a very timely commit and it is fast, but after running it multiple times for testing and validation found something very interesting.
The inserts were working really well until the table had 2147483647 (Integer.MAX_LIMIT) number of records, the inserts were happening at a very constant speed of 60,000,000 every 10 mins roughly with a commit size of 2 million. After the job inserts 2147483647 records into the table the job doesn't do anymore. I have ran the job more than 5 times and have noticed it gets to the same exact size every single time. I would like to know if h2 uses the Integer.MAX_SIZE for any of the row size calculation or something like that. Because I read on the limits and limitation of h2 that a single table could have 2^64 rows. So it would be helpful if someone could explain more on this issue.

Thanks a lot.

regards
Harish



Steve McLeod

unread,
Apr 4, 2012, 6:51:29 AM4/4/12
to h2-da...@googlegroups.com
Harish,

This sounds suspiciously like a case of internally or externally using a signed type instead of an unsigned type...The range of integers in Java is 2^64...but half of those are negative...and you are getting close to 2^32 rows in your table before the problem.

I guess Thomas will be on to this quickly, but I'm curious about this problem and want to look into it myself too. Can you post some simple code that reproduces this? Eg, Java code to create and populate a table with only one or two columns.

Regards,

Steve

Harish Mahadevan

unread,
Apr 4, 2012, 7:26:04 AM4/4/12
to h2-da...@googlegroups.com
Hi Steve,

That's exactly what I want to know if h2 is using a signed type instead of a unsigned type for some sort, because all that am doing is using a INSERT INTO AS SELECT and dumping the whole table many times.  The whole table is linked from a old H2 db to this new one.  Unfortunately it would take me sometime to recreate this issue, so was wondering if there is any present issues related to this problem in h2 that sticks out.  

Thank you

Regards
Harish
--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To view this discussion on the web visit https://groups.google.com/d/msg/h2-database/-/RZzo2iyfcg4J.

Steve McLeod

unread,
Apr 4, 2012, 7:36:53 AM4/4/12
to h2-da...@googlegroups.com
Harish,

What exactly happens (or doesn't happen) when you get to row 2147483647? Do you get an exception? Does the db just not respond? If so, at what point? The more detail you can give me the easier it is for me to find where to start looking in the H2 source.
To unsubscribe from this group, send email to h2-database+unsubscribe@googlegroups.com.

Noel Grandin

unread,
Apr 4, 2012, 8:30:09 AM4/4/12
to h2-da...@googlegroups.com, Harish Mahadevan

The most probable location of the problem is in the
org.h2.store.PageStore code.

Why exactly are you trying to store this many rows?

Noel Grandin

unread,
Apr 4, 2012, 8:53:21 AM4/4/12
to h2-da...@googlegroups.com
I've created a test-case for this, but I'm not very motivated to chase
it down, since it'll take 9 hours for a single run.
Sorry :-(

TestExtremelyLargeTable.java

Harish Mahadevan

unread,
Apr 4, 2012, 10:40:07 AM4/4/12
to h2-da...@googlegroups.com
Hey Guys,

I apologize for not providing with sufficient details, so here they go

Stack trace as what the job is currently doing after the 2147483647 rows get inserted,

java.lang.Thread.State: RUNNABLE
at org.h2.util.CacheLRU.find(CacheLRU.java:281)
at org.h2.util.CacheLRU.get(CacheLRU.java:287)
at org.h2.store.PageStore.getPage(PageStore.java:725)
- locked <0x0000000600010308> (a org.h2.store.PageStore)
at org.h2.index.PageDataIndex.getPage(PageDataIndex.java:226)
at org.h2.index.PageDataNode.addRowTry(PageDataNode.java:128)
at org.h2.index.PageDataNode.addRowTry(PageDataNode.java:129)
at org.h2.index.PageDataNode.addRowTry(PageDataNode.java:129)
at org.h2.index.PageDataNode.addRowTry(PageDataNode.java:129)
at org.h2.index.PageDataIndex.addTry(PageDataIndex.java:167)
at org.h2.index.PageDataIndex.add(PageDataIndex.java:130)
at org.h2.table.RegularTable.addRow(RegularTable.java:121)
at org.h2.command.dml.Insert.addRow(Insert.java:162)
at org.h2.command.dml.Select.queryFlat(Select.java:524)
at org.h2.command.dml.Select.queryWithoutCache(Select.java:618)
at org.h2.command.dml.Query.query(Query.java:298)
at org.h2.command.dml.Insert.insertRows(Insert.java:132)
at org.h2.command.dml.Insert.update(Insert.java:84)
at org.h2.command.CommandContainer.update(CommandContainer.java:73)
at org.h2.command.Command.executeUpdate(Command.java:226)
- locked <0x000000060000ef78> (a org.h2.engine.Database)
at org.h2.jdbc.JdbcStatement.executeUpdateInternal(JdbcStatement.java:125)
- locked <0x000000060003d450> (a org.h2.engine.Session)
at org.h2.jdbc.JdbcStatement.executeUpdate(JdbcStatement.java:110)

To steve's question yeah the job doesn't respond once i hit that row count, but while checking cpu activity it shows that the job is taking a 100% of the CPU, so really cant tell what h2 is trying to do apart from allocating cache_size of some sort.

Thanks a lot Noel for creating and running a test, i am also going to run that job on my side with more memory. 

Thanks a lot guys will keep you posted on the progress.

regards
Harish

On Wed, Apr 4, 2012 at 8:53 AM, Noel Grandin <noelg...@gmail.com> wrote:
I've created a test-case for this, but I'm not very motivated to chase it down, since it'll take 9 hours for a single run.
Sorry :-(
--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database+unsubscribe@googlegroups.com.

Noel Grandin

unread,
Apr 4, 2012, 10:47:10 AM4/4/12
to h2-da...@googlegroups.com, Harish Mahadevan

Cranking up the page size will probably extend the number of rows you
can store in a table.
The default is 2K, I would crank it up to 16K.

Steve McLeod

unread,
Apr 4, 2012, 11:23:52 AM4/4/12
to h2-da...@googlegroups.com, Harish Mahadevan
I don't think increasing the page size is the answer. I've been running a test with the default page size, inserting batches of 1,000,000 rows at a time, and it has surpassed Integer.MAX_INT.

I'll change my test now to use INSERT INTO AS SELECT to see if that surpasses Integer.MAX_INT

Steve McLeod

unread,
Apr 4, 2012, 11:33:21 AM4/4/12
to h2-da...@googlegroups.com, Harish Mahadevan
Although...having said that...page ids are ints...so there does seem to be a maximum amount of data , which is Integer.MAX_INT * pagesize. By default, that is 4 398 046 509 056, or roughly 4 Terabytes.

Harish, how big is your db file once H2 stops responding?

Steve McLeod

unread,
Apr 4, 2012, 6:05:31 PM4/4/12
to h2-da...@googlegroups.com, Harish Mahadevan
I've run two tests today. Both involved inserting lots of rows. One test used many INSERT INTO statements. The other test used INSERT INTO ... SELECT statements that added 1 million rows at a time.

Both successfully added much more than Integer.MAX_VALUE rows.

Harish, I guess that means we need you to create a reproducible test case.

For the record, in both cases these are the db options I used: LOG=0;CACHE_SIZE=0;LOCK_MODE=0;UNDO_LOG=0


On Wednesday, 4 April 2012 17:23:52 UTC+2, Steve McLeod wrote:
Reply all
Reply to author
Forward
0 new messages