Hi Daniel
In WiredTiger, the compact
command attempts to release space to the operating system that is unused by WiredTiger. Although WiredTiger was designed to minimize any fragmentation in your data files, fragmentation may accumulate under certain workloads.
The available space for reuse by WiredTiger can be seen in the output of the command db.collection.stats().wiredTiger['block-manager']['file bytes available for reuse']
. This space could potentially (depending on your specific workload, document size, etc.) be released to the operating system using the compact
command (but there is no guarantee).
is there supposed to be any indication of how it’s going? I did observe the system was doing a lot of reads and occassional write bursts and the mtimes on collection files were changing, so I suspect it was doing something.
No I don’t believe there is a progress bar or similar. It could take a while to finish if you have a large dataset.
Is this something I can do multiple times where it will progress each time, or will I have to take some massive outage to let it finish?
It’s best to wait for it to finish. It was not designed to be interruptible.
I didn’t see the files change size, nor are the indexes reported any different than before. Is this something that happens only at the end of the compact operation?
It may also be possible that compact
was not able to free up any disk space. This could be due to WiredTiger being able to fill up deleted spaces effectively, and compacting the data files would result in minimal disk space gain.
Best regards
Kevin
Hi Daniel
Sounds like you’re overhauling your application, and ended up in what is essentially a new database. Since you’re running WiredTiger, the space will eventually be reused by new data. Unless your disk space is at a critical point, I would just leave it to WiredTiger to reuse the space. Please note that the unused space may be returned to the operating system when WiredTiger completes a checkpoint (by default, every 60 seconds or 2GB of data as of MongoDB 3.4; see Snapshots and Checkpoints). Having said that, note that the space reclamation is not instant and may depend on your workload.
One thought that I had was to run compact for several hours on the secondary, interrupt it to let replication catch back up, and cycle as such until it finishes.
I would advise against this method, since compact
was never designed to be interruptible. There is a significant risk that your database can end up in some undefined state.
That is why I was hoping to get some indicator of progress
There is a feature request for this in SERVER-24618. Please comment/upvote on the ticket to raise awareness into this issue and your use case.
Best regards
Kevin