mongo-cluster1-shard1:PRIMARY> show dbs
Db1 0.460GB
Db2 0.077GB
Db2 0.032GB
Db4 3.837GB
Db5 16.745GB
Db6 0.001GB
admin 0.000GB
local 1.374GB
mongo-cluster1-shard1:SECONDARY> show dbs
Db1 0.077GB
Db2 0.058GB
Db3 0.006GB
Db4 3.835GB
Db5 0.128GB
DB6 0.001GB
admin 0.000GB
local 0.052GB
Hi,
The database size numbers next to the database names are the database sizes on disk, and do not reflect the actual content of the database itself. Within nodes in a replica set, it is expected to have these numbers vary, especially in a newly synced member. This is because MongoDB’s replication protocol is not a binary replication, thus it is possible that the contents are stored differently within each node.
A better comparison metric would be the output of db.stats(). Of note are the metrics dataSize, which reflects the actual data size instead of storage size. See dbStats for more details on the output of db.stats().
It’s important to note that once a node reached the status of SECONDARY after an initial sync, it has finished applying all the primary’s oplog, and is ready to take over as primary in the event of a failover. Other states (e.g. RECOVERING, STARTUP, etc.) do not have this guarantee. Hence once it is a SECONDARY, you can be reasonably certain that it has all of the data in the current primary.
Best regards,
Kevn
mongo-cluster1-shard1:PRIMARY> db.stats()
{
"db" : "Db5",
"collections" : 46,
"views" : 0,
"objects" : 78063,
"avgObjSize" : 6761.67507013566,
"dataSize" : 527836641,
"storageSize" : 17776951296,
"numExtents" : 0,
"indexes" : 162,
"indexSize" : 202719232,
"ok" : 1
}
mongo-cluster1-shard1:SECONDARY> db.stats()
{
"db" : "Db5",
"collections" : 46,
"views" : 0,
"objects" : 78066,
"avgObjSize" : 6761.701816411754,
"dataSize" : 527859014,
"storageSize" : 131010560,
"numExtents" : 0,
"indexes" : 162,
"indexSize" : 5918720,
"ok" : 1
Hi,
Can you explain a little more why there is no much difference in the storage size.
The page dbStats output I linked previously should have a good explanation of the output metrics. To paraphrase:
dbStats.dataSize
The total size of the uncompressed data held in this database. The dataSize decreases when you remove documents.
also:
dbStats.storageSize
The total amount of space allocated to collections in this database for document storage. The storageSize does not decrease as you remove or shrink documents. This value may be smaller than dataSize for databases using the WiredTiger storage engine with compression enabled.
In combination, what the two stats are showing is that dataSize is the actual size of your data (uncompressed), and the storageSize is the actual size on disk. If your workload involves a lot of updates and deletes, eventually there could be fragmentation in how the files are stored on disk.
This is deliberate, since WiredTiger will eventually reuse the “blank” spots within the existing file when you insert a new document. This is done so that WiredTiger doesn’t have to constantly resize the disk files, which is an expensive operation. Imagine if WiredTiger is forced to resize and rearrange the disk files each time you delete a single document, especially if the document was physically stored in the middle of the data file. Performance won’t be very good. This is also the case with indexes, and the reason why the newly synced member has a smaller index size.
If you really need to recover space, then you can run compact. However please note that this operation will require downtime on the node involved, and is a highly disruptive operation to run. There is also no guarantee that space will be recovered. The best way to ensure that you recover space is to perform an initial sync on the node (which you did in your case).
Best regards
Kevin