OutOfMemory when creating an Index

58 views
Skip to first unread message

christof...@finaris.de

unread,
May 21, 2019, 3:52:37 AM5/21/19
to h2-da...@googlegroups.com
Hi,

we shortly upgraded to version H2 1.4.199.
The connect string is jdbc:h2:file:<FILE_PATH>;CACHE_SIZE=16384;MULTI_THREADED=1;MV_STORE=TRUE
Additionally, we set the MAX_MEMORY_ROWS property to 10 (worked fine with the old page store that we used before).
Max heap of my application (XMX) was set to 4G.

When creating an index for a table with 8,000,000 rows, H2 runs out of memory.
The screenshot below from my profiling application was taken during index creation.
It shows that the maps member of MVStore class consumes more and more memory, until OOM Error is raised.

I tracked this down to the rebuildIndexBlockMerge method of the MVTable class (see below).
As I saw that the MAX_MEMORY_ROWS parameter is used in the method, I changed its values and tried again.
With set to 1000 the index creation still required a maximum heap of about 800M, but the OOM Error did not occur anymore.
This sounds a bit strange to me, because as far as I understood lowering the value of that parameter should decrease memory consumption.

Is anything wrong with my configuration? Or might this be a bug?


Thanks for any help in advance,

Christoff Schmitz


 
private void rebuildIndexBlockMerge(Session session, MVIndex index) {
   
if (index instanceof MVSpatialIndex) {
       
// the spatial index doesn't support multi-way merge sort
       
rebuildIndexBuffered(session, index);
   }
   
// Read entries in memory, sort them, write to a new map (in sorted
   // order); repeat (using a new map for every block of 1 MB) until all
   // record are read. Merge all maps to the target (using merge sort;
   // duplicates are detected in the target). For randomly ordered data,
   // this should use relatively few write operations.
   // A possible optimization is: change the buffer size from "row count"
   // to "amount of memory", and buffer index keys instead of rows.
   
Index scan = getScanIndex(session);
   
long remaining = scan.getRowCount(session);
   
long total = remaining;
   Cursor cursor = scan.find(session,
null, null);
   
long i = 0;
   Store store = session.getDatabase().getStore();

   
int bufferSize = database.getMaxMemoryRows() / 2;
   ArrayList<Row> buffer =
new ArrayList<>(bufferSize);
   String n = getName() +
":" + index.getName();
   
int t = MathUtils.convertLongToInt(total);
   ArrayList<String> bufferNames = Utils.newSmallArrayList();
   
while (cursor.next()) {
       Row row = cursor.get();
       buffer.add(row);
       
database.setProgress(DatabaseEventListener.STATE_CREATE_INDEX, n,
               MathUtils.convertLongToInt(i++), t);
       
if (buffer.size() >= bufferSize) {
           sortRows(buffer, index);
           String mapName = store.nextTemporaryMapName();
           index.addRowsToBuffer(buffer, mapName);
           bufferNames.add(mapName);
           buffer.clear();
       }
       remaining--;
   }
   sortRows(buffer, index);
   
if (!bufferNames.isEmpty()) {
       String mapName = store.nextTemporaryMapName();
       index.addRowsToBuffer(buffer, mapName);
       bufferNames.add(mapName);
       buffer.clear();
       index.addBufferedRows(bufferNames);
   }
else {
       addRowsToIndex(session, buffer, index);
   }
   
if (remaining != 0) {
       DbException.throwInternalError(
"rowcount remaining=" + remaining +
               
" " + getName());
   }
}



================================================================================================================
Disclaimer
The information contained in this e - mail and any attachments ( together the "message") is intended for the addressee only and
may contain confidential and/or privileged information. If you have received the message by mistake please delete it and notify
the sender and do not copy or distribute it or disclose its contents to anyone.

FINARIS Financial Software Partner GmbH, Sömmerringstr. 23, 60322 Frankfurt/Main, Germany
Registered at Frankfurt/Main, HRB 52873, Managing Directors: Dipl. Inf. Hermann Friebel, Dipl. Ing. Kai Bächle, Dipl. Inf. Werner Märkl
================================================================================================================

Noel Grandin

unread,
May 21, 2019, 4:09:17 AM5/21/19
to h2-da...@googlegroups.com, christof...@finaris.de
On 2019/05/21 9:52 AM, christof...@finaris.de wrote:
>
> I tracked this down to the *rebuildIndexBlockMerge *method of the *MVTable *class (see below).
> As I saw that the *MAX_MEMORY_ROWS *parameter is used in the method, I changed its values and tried again.
> With set to 1000 the index creation still required a maximum heap of about 800M, but the OOM Error did not occur anymore.
> This sounds a bit strange to me, because as far as I understood *lowering *the value of that parameter should *decrease
> *memory consumption.
>
> Is anything wrong with my configuration? Or might this be a bug?
>
>

<Chuckle>

Now that is an interesting failure mode. Given how much memory you seem to have, you should be setting MAX_MEMORY_ROWS
*higher*, not lower. Running with MAX_MEMORY_ROWS set to 10 is something we only expect to see in unit tests, when we're
trying to test our on-disk temporary-data code-paths.

So what is happening is that we create a new temporary map for each block of MAX_MEMORY_ROWS rows, which is your case
means we end up creating 8,000,000/10 = 800,000 maps!

So I would suggest either leaving MAX_MEMORY_ROWS alone, which will auto-configure to a reasonable number, or
configuring it such that virtually all your queries run in-memory without needing to spill temporary data to disk.

Which will also mean that your application will run a whole lot faster.

christof...@finaris.de

unread,
May 21, 2019, 10:46:57 AM5/21/19
to h2-da...@googlegroups.com
Hi Noel,

our users typically deal with huge amounts of data, which often do not fit into memory. Tables might have hundreds of columns, so that already a low amount of rows held in memory can occupy a lot of it.
Additionally, queries are often issued in parallel, therefore a low value was set so that the users do not need to take care of advanced H2 settings and guarantee there will not be an out of memory error at any time.
The performance was still good.

Anyway, the problem I still have is, that
a high value of MAX_MEMORY_ROWS will require a small amount of memory during index creation, but lots of memory for result sets.
a low value of MAX_MEMORY_ROWS will require a huge amount of memory during index creation, but only a small amount of memory for result sets.

Would it be possible to introduce a new database property for the index creation?
This one could default to the current default for MAX_MEMORY_ROWS.
This would help us a lot.





--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
To post to this group, send email to h2-da...@googlegroups.com.
Visit this group at
https://groups.google.com/group/h2-database.
To view this discussion on the web visit
https://groups.google.com/d/msgid/h2-database/277d5c3d-aeff-0a9a-0751-1694d8a0a80c%40gmail.com.
For more options, visit
https://groups.google.com/d/optout.

Noel Grandin

unread,
May 21, 2019, 12:08:29 PM5/21/19
to h2-da...@googlegroups.com

If you are trying to prevent users from exceeding memory resources, your best bet is just to use a connection pool and limit the max number of connections. 
Note that even if a user issues multiple queries in parallel to the same connection, those queries will execute sequentially server-side.

I still maintain that your existing performance will be terrible compared to what it could be, since you are effectively limited by very slow disk IO, even for very small queries.

If you limit the max number of connections, and raise MAX_MEMORY_ROWS to a reasonable number, you will experience a net gain in performance.

christof...@finaris.de

unread,
May 22, 2019, 5:37:33 AM5/22/19
to h2-da...@googlegroups.com
This could be another optimization, but that wont help if I have a table with millions of rows and an assumed column width of 4MB of data.
With 4 GB heap, any value of MAX_MEMORY_ROWS > 1000 will lead to OOM Error when selecting from that table.
And a value of 1000 might not be high enough for index creation, if there are millions of records.
So both won't be possible for the same session.

--

You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to

h2-database...@googlegroups.com.


To post to this group, send email to

h2-da...@googlegroups.com.
Visit this group at
https://groups.google.com/group/h2-database.
To view this discussion on the web visit
https://groups.google.com/d/msgid/h2-database/CAFYHVnXte%2BOQMcFj6A1RqKXasXdoDFrHhdDaavry4U4qkSk%2B6g%40mail.gmail.com.
For more options, visit
https://groups.google.com/d/optout.

Noel Grandin

unread,
May 22, 2019, 5:52:10 AM5/22/19
to h2-da...@googlegroups.com, christof...@finaris.de

One workaround is that you can use EXCLUSIVE_MODE for index creation:

http://h2database.com/html/commands.html#set_exclusive

Noel Grandin

unread,
May 22, 2019, 12:50:41 PM5/22/19
to h2-da...@googlegroups.com, christof...@finaris.de
On Wed, 22 May 2019 at 11:52, Noel Grandin <noelg...@gmail.com> wrote:

One workaround is that you can use EXCLUSIVE_MODE for index creation:


meh, ignore me, that won't help.

Andrei Tokar

unread,
May 22, 2019, 9:16:24 PM5/22/19
to H2 Database
Something does not click here:
If you have "a table with millions of rows and an assumed column width of 4MB of data", then how it is possible, that
"with set to 1000 the index creation still required a maximum heap of about 800M, but the OOM Error did not occur anymore" ?
Your thousand rows should take at least 4G of RAM, not 800M.

IMHO, index creation for a big table is an administrative task, presumably performed on idle (if not exclusively held) database,
so what would prevent you from opening db with MAX_MEMORY_ROWS of lets say 3000 (assuming 4g heap), creating index, then restart database
with you favorite 10, for application to use?

On the other hand, we probably should select buffer size as SQRT(ROWCOUNT) and if it exeeds MAX_MEMORY_ROWS/2, then just fall back to plain vanilla rebuildIndexBuffered().
It might  take forever and will trash b-tree, but at least should not fail with OOM.

christof...@finaris.de

unread,
May 23, 2019, 3:23:15 AM5/23/19
to h2-da...@googlegroups.com
Hi Andrei,

The one with the 800MB heap was my initial test case, the 4MB row size was just an example to clarify my problem.

Unfortunately, creating the indexes in an adminstrative mode is not a solution for us.
H2 is just used as a temporary database helping us to bring together data from different sources (CSV, XML, different databases, ...), results are compacted and rendered to an excel spreadsheet.
When our application terminates, the H2 database is deleted.




From:        Andrei Tokar <andrei...@gmail.com>
To:        H2 Database <h2-da...@googlegroups.com>
Date:        23.05.2019 03:16
Subject:        Re: [h2] OutOfMemory when creating an Index
Sent by:        h2-da...@googlegroups.com




--

You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to

h2-database...@googlegroups.com.


To post to this group, send email to

h2-da...@googlegroups.com.
Visit this group at
https://groups.google.com/group/h2-database.
To view this discussion on the web visit
https://groups.google.com/d/msgid/h2-database/212abf4b-e484-4253-92e2-5799166b1ce0%40googlegroups.com.
For more options, visit
https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages