Operations best practices: Compaction

142 views
Skip to first unread message

Trev

unread,
Mar 24, 2021, 3:22:03 PM3/24/21
to RavenDB - an awesome database
Is it good practice to schedule regular compaction of a database that deals with transient data?

We're using RavenDB 5.1 in production and are adding more and more use cases that store transient data - data that is written, used for a while, and then deleted (either expired or explicitly deleted by the app). Examples include shopping carts - they stick around until a purchase is made, or the user abandons them for more than a few days. Typically indexes aren't used - the database is used as a temporary, but durable, store. 

With the normal use of the database in this manner for transient documents, rather than long term documents, should we consider compaction on a regular basis, or does RavenDB normally do any housekeeping in the background in order to keep the database size in line with the working set, instead of continually growing?

Space isn't currently a concern, but if we don't compact the DB, will it be a concern in 5 years if we're creating and expiring hundreds of thousands of documents per day?

Egor Shamanaev

unread,
Mar 25, 2021, 4:10:11 AM3/25/21
to rav...@googlegroups.com
Hi 

Yes, the datafile is not getting defragmented automatically, so it's size can grow up in size, one of the ways to keep it defragmented is to use compaction. 
Please note the compaction task will take the database offline. 

--
You received this message because you are subscribed to the Google Groups "RavenDB - an awesome database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ravendb+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ravendb/d31ee5f6-e9b0-450a-be6b-a0b743663b60n%40googlegroups.com.


--
Egor
Developer   /   Hibernating Rhinos LTD
Support:  sup...@ravendb.net
  

Trev

unread,
Mar 25, 2021, 9:22:41 AM3/25/21
to RavenDB - an awesome database
Thanks. Some follow on questions in that case:

1. Are there any indications in the stats/diagnostics that we could use to judge the level of fragmentation so that we compact when it's needed?
2. Are there any operational recipes to ensuring the "database going offline" can be done without taking our applications offline? e.g. One node in the database group at a time?
3. Any future plans to have a "lightweight, online compaction" mode that can be safely run in the background or to optimize the reuse of space consumed by deleted documents when there aren't any indexes?

Egor Shamanaev

unread,
Mar 25, 2021, 11:11:41 AM3/25/21
to rav...@googlegroups.com
1. You can see in the Free section of datafile storage report:
image.png

2. Yes, compacting the database from the studio will take it offline on the current server.
3. no, there no plans for it currently, I have opened an issue for this 
https://issues.hibernatingrhinos.com/issue/RavenDB-16405

Oren Eini (Ayende Rahien)

unread,
Apr 7, 2021, 8:59:46 AM4/7/21
to ravendb
Some more details about this, which I think are very important.

RavenDB will manage the disk space on the file. When you delete data from RavenDB, it will mark that space is free. We won't give that space back to the OS, but we are able to reuse that.
In general, in the past 7 years or so that we have had Voron running in production, there hasn't been a single instance where RavenDB hasn't been able to reuse this space when new data came in.

In other words, there is no need to run compaction on the database as a regular maintenance act. RavenDB will just manage on its own.
The only reason you'll want to run compaction is if:

* You are changing the compression of documents.
* You deleted a lot of documents, and you want to recoup the space for the operating system.


Tobias Zürcher

unread,
Apr 7, 2021, 6:02:48 PM4/7/21
to RavenDB - an awesome database
* You deleted a lot of documents, and you want to recoup the space for the operating system.

so when i see my db server getting full, after deleting data, i still need to compact = take offline.

to be honest: it is a bit counterintuitive and a bit against "it just works". if i delete things, i do expect to get some space on the disk.

Oren Eini (Ayende Rahien)

unread,
Apr 14, 2021, 9:19:33 AM4/14/21
to ravendb

Trev

unread,
Apr 14, 2021, 10:24:52 AM4/14/21
to RavenDB - an awesome database
Thanks Oren. I think this covers off the initial concern well. To summarize and echo back:

  • Our initial concern is related to ensuring RavenDB will reuse space freed up by deleted documents instead of continuously growing the size of the data file even if the total number of (undeleted) documents doesn't grow over time. 
  • We're not so concerned with giving unused space back to the OS - as long as the data file size stabilizes to some reasonable value for the working set of documents, rather than continuously growing without bound. 
  • It seems Voron reuses free space by design. So in practice, we should see the database file size grow at the beginning, then mostly stabilize (potentially around 50% free space, depending on the ratio of new documents creation rate to document deletion rate). 
  • If we see the database file size continually grow well beyond what's required by the working set (e.g. 90% free space), this indicates an issue. We would need to do a compaction, which would take one database offline at a time (which is mitigated by having the database on multiple nodes, compacting one node at a time to maintain availability). 
So our approach would be to treat compactions as an as-needed operation in exceptional circumstances, rather than regular operational maintenance needed to maintain acceptable health and resource usage. We would monitor for any databases that end up with free space above a high threshold (80-90%) for an extended period of time, but would not expect this to happen in normal continuous use.
Reply all
Reply to author
Forward
0 new messages