Compact on db close does not shrink db file size

70 views
Skip to first unread message

Ulrich

unread,
Mar 12, 2024, 4:28:12 AMMar 12
to H2 Database
Hi all,

I have big problems after migrating vom 1.4.199 to 2.2. 229-SNAPSHOT (and other 2.x.x versions). The snapshot build uses git code from 10.03.2024.

My database has around 50.000 tables storing several 100 millions of rows. Its a kind of time series database. Data is added continously (around 1500 rows per second) and data older than 30 days is removed once a day.

Apart from the fact that 2.x requires 50%-150% more CPU load compared to 1.4 (using page store) I observed that the 2.x database grows continuously and with enormous rate when removing old data (from 20GB to 90GB in 10 minutes while removing data).

My parameters are:
WRITE_DELAY=10000;MAX_MEMORY_ROWS=1000000;MAX_OPERATION_MEMORY=50000000;ANALYZE_AUTO=0;RETENTION_TIME=0

As long as the automatic compaction did not show nice results I decided to switch off the automatic compaction, set MAX_COMPACT_TIME to 30000 and shutdown the database each 5 minutes using SHUTDOWN. I use SHUTDOWN instead of SHUTDOWN COMPACT to get control over the maximum time while the db is not available.

I observed that all of my SHUTDOWNs do not run longer than 7 seconds. The only effect is that INFO.LEAF_RATIO drops from x to 0 and the INFO.CHUNK_COUNT increases by 1. The database file size never shrinks. After 8 hours I got a database > 220GB.
After running SHUTDOWN its compact relevant parameters are:
INFO.CHUNKS_FILL_RATE 7
INFO.CHUNKS_FILL_RATE_RW 7
INFO.CHUNK_COUNT 86163
INFO.FILE_SIZE 223603212288
INFO.FILL_RATE 100
INFO.LEAF_RATIO 0
INFO.PAGE_COUNT 387816653
INFO.PAGE_COUNT_LIVE 51646729
INFO.PAGE_SIZE 16384

I checked MAX_COMPACT_TIME and it is 30000. 
Using H2 1.4.199 with PageStore and feeded with exact the same data creates a 36GB database without any SHUTDOWN.

Why does the compact algorithm has no effect to the database size? 
Why doesn't it fully use the MAX_COMPACT_TIME setting? 
Why dosen't it touch the CHUNKS_FILL_RATE?

Can you give an hint how to restrict the database file size?

Thanks!
Ulrich

Noel Grandin

unread,
Mar 12, 2024, 4:44:49 AMMar 12
to h2-da...@googlegroups.com


On 3/12/2024 10:28 AM, Ulrich wrote:
>
> My database has around 50.000 tables storing several 100 millions of rows. Its a kind of time series database. Data is
> added continously (around 1500 rows per second) and data older than 30 days is removed once a day.
>

I suggest (if you are not doing this already) that you move to an architecture where instead of running DELETE when
removing old data, you can run DROP TABLE or TRUNCATE TABLE, which will be more efficient with H2.

> As long as the automatic compaction did not show nice results I decided to switch off the automatic compaction, set
> MAX_COMPACT_TIME to 30000 and shutdown the database each 5 minutes using SHUTDOWN. I use SHUTDOWN instead of SHUTDOWN
> COMPACT to get control over the maximum time while the db is not available.
>

Unfortunately SHUTDOWN just does not try very hard, if you want to reduce disk space you will need to use SHUTDOWN COMPACT.

H2 is unfortunately not a great match for your specific use-case, and I don't think there is really anything in the way
of database parameters that will make a big difference.

You could try batching your inserts (i.e. inserting a bunch of rows before doing a COMMIT), that sometimes helps reduce
the disk usage.

Regards, Noel.

Andrei Tokar

unread,
Mar 19, 2024, 11:25:18 AMMar 19
to H2 Database
Hi Ulrich,
Do you remove data in one big transaction? That would cause big transaction log to be created. I also agree with Noel's suggestion to insert data in batches.
As far as compaction goes, I would keep WRITE_DELAY at it's default (500), or even smaller (but positive) value. Apart from controlling how often auto-commit check runs, it also dictates how often background thread is trying to optimize file space by rewriting pages (evacuating pages from sparsely populated chunks) and compacting (moving chunks around). With high rate of inserts/updates auto-commit (not db commit, but chunk store operation) would happen anyway, once chunk reaches a certain size (around 1 MB). WRITE_DELAY intended to handle cases with low update rates, when save should be triggered (even if new chunk is still small) to prevent data loss. So setting WRITE_DELAY to smaller value should intensify background file space housekeeping. If it's still not good enough for your case and you would like to go with periodic shut-downs, use SHUTDOWN COMPACT. This is a full database rewrite, and it's the best way to minimize disk space. Normal SHUTDOWN is limited by not only MAX_COMPACT_TIME, but also by 16MB of rewrites in size (not very helpful in your case).

Ulrich

unread,
Mar 19, 2024, 11:51:03 AMMar 19
to H2 Database
Hi Noel, hi Andrei,

thanks for your explanations around the compact procedures!

I will try to play with the WRITE_DELAY parameter and see if that improves the database filesize behaviour.

Data removal is executed in one transaction per table. It is not easily possible to change the application in order to use batch inserts. 
So I tried to use SHUTDOWN COMPACT when the CHUNK_FILL_RATE drops below a given threshold and/or filesize exceeds a given threshold. The downside is that compaction takes up to 10 minutes when the database is large (e.g. 50GB). This is a downtime of the application which I want to minimize.

The reason I am contacting you is that MVStore database are growing much much faster than the old PageStore databases, no matter if autocompaction is enabled or not. This is a big downside for me so I would be happy if you can improve this at some point.

Anyhow, from my point of view H2 is a very good database especially in embedded context!
Thanks!
Ulrich


Reply all
Reply to author
Forward
0 new messages