Estimate the Oplog Size

3,211 views
Skip to first unread message

bichonfrise74

unread,
May 30, 2012, 3:52:50 PM5/30/12
to mongodb-user
First of all, this is a noob writing so please be patient.

I have encountered the RS102 error and I tried to increase the oplog
size to 5 GB. My total data size is about 40GB. But after I did a
mongorestore, I got the RS102 error again. So, I am assuming that the
oplog size is too small.

So, how do I estimate the needed oplog size? Do I need to know the
write speed to the oplog (if so, how do I get this)?

Thanks.

markh

unread,
May 30, 2012, 6:58:03 PM5/30/12
to mongod...@googlegroups.com
Hi,

What did you do to resize your oplog? Did you do it on the primary and/or secondary? 


To answer your question, it depends on how much data you insert/update over time. I would chose a size which allows you many hours or even days of oplog.

How big are records and how many do you want? At 2000 documents per second for documents of 1KB, that would net you 120MB per minute and your 5GB oplog would last about 40 minutes. This means if the secondary ever goes offline for 40 minutes or falls behind by more than that, then you are stale and have to do a full resync.

Additionally, ensure that if you loading lots of data that you periodically verify the writes across more than one member of the replicaset.

http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+with+getLastError

Thanks, Mark

bichonfrise74

unread,
May 31, 2012, 1:13:50 PM5/31/12
to mongodb-user
Basically, I did the following steps:
1. Stop the mongodb instance
2. Change the oplog size in /etc/mongodb.conf
3. Delete all the files in the data directory
4. Start the mongodb instance

Is there a problem with the above? I did that on both master and
slave.

Where do I get this information? 'I would chose a size which allows
you many hours or even days of oplog'

How do I find this information? - 'How big are records and how many do
you want'

Thanks.

On May 30, 3:58 pm, markh <m...@10gen.com> wrote:
> Hi,
>
> What did you do to resize your oplog? Did you do it on the primary and/or
> secondary?
>
> Here's a document on changing the size of your oplog -http://docs.mongodb.org/manual/tutorial/change-oplog-size/?highlight=....
>
> To answer your question, it depends on how much data you insert/update over
> time. I would chose a size which allows you many hours or even days of
> oplog.
>
> How big are records and how many do you want? At 2000 documents per second
> for documents of 1KB, that would net you 120MB per minute and your
> 5GB oplog would last about 40 minutes. This means if the secondary
> ever goes offline for 40 minutes or falls behind by more than that, then
> you are stale and have to do a full resync.
>
> Additionally, ensure that if you loading lots of data that you periodically
> verify the writes across more than one member of the replicaset.
>
> http://www.mongodb.org/display/DOCS/Verifying+Propagation+of+Writes+w...

markh

unread,
May 31, 2012, 2:08:11 PM5/31/12
to mongod...@googlegroups.com
Hi,

So using those steps, in theory that will work and it is the simplest way, however, you have basically deleted all your data so you are going to have recreate your data, is that what you wanted?

The link on my previous email detailed the recommended method of increasing the oplog and preserving your data.

Can you run this command on the master and reply with the output -

-> db.printReplicationInfo()

In terms of your records, that's not something I can tell. Here's some links that should help you - 



Thanks, Mark

bichonfrise74

unread,
May 31, 2012, 6:43:26 PM5/31/12
to mongodb-user
Hi,

Related to the method that I posted, I definitely do not want to
delete the data but I thought that was the only way. Anyway, I will
definitely check the method you posted.

Here's the output.

SECONDARY> db.printReplicationInfo()
configured oplog size: 5000MB
log length start to end: 186556secs (51.82hrs)
oplog first event time: Tue May 29 2012 10:28:23 GMT-0700 (PDT)
oplog last event time: Thu May 31 2012 14:17:39 GMT-0700 (PDT)
now: Thu May 31 2012 15:38:04 GMT-0700
(PDT)

With the above oplog size, I still got a RS102 and the database size
is 35 GB.

Thanks.

markh

unread,
Jun 1, 2012, 11:08:20 AM6/1/12
to mongod...@googlegroups.com
Hi,

That replication information is from the Slave device, not the Master. Can you re-run that command on that Master as well as "rs.status()".

I suspect that the Oplog size is still the original size on the Master and so the Slave can't catch up.

Thanks, Mark

bichonfrise74

unread,
Jun 1, 2012, 1:05:48 PM6/1/12
to mongodb-user
This is what I get on the primary.

PRIMARY> db.printReplicationInfo()
configured oplog size: 5000MB
log length start to end: 341021secs (94.73hrs)
oplog first event time: Mon May 28 2012 11:14:23 GMT-0700 (PDT)
oplog last event time: Fri Jun 01 2012 09:58:04 GMT-0700 (PDT)
now: Fri Jun 01 2012 10:05:14 GMT-0700 (PDT)

The oplog size is the same.

markh

unread,
Jun 1, 2012, 3:07:59 PM6/1/12
to mongod...@googlegroups.com
Ok, so now you're running a replica set, yes?

So your secondary device is stale and you have to resync from scratch. Please see this link - http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member - on how to do that.

You can either wipe all the data files on the secondary and let it sync automatically or perform a 'fsync & lock' on the primary and copy the files over to the primary (this is documented here - http://docs.mongodb.org/manual/administration/backups/).

bichonfrise74

unread,
Jun 1, 2012, 4:13:13 PM6/1/12
to mongodb-user
I apologize for not being clear in the first place. I do have a
replica set and it was originally set with an oplog size of 1 GB. So,
I increased both master and slave to 5 GB each using the same link
that you posted, then after a week, my secondary got the RS102 error
again.

So, now I do not know how to get the number of set the oplog size.
Would 10 GB be enough? I mean there must be a formula or a method to
see if I am doing so much writes (which I don't know how to get), then
the oplog size should be this much.

Do you think I should just increase the oplog size of the secondary to
10 GB and just make the primary 5 GB?



On Jun 1, 12:07 pm, markh <m...@10gen.com> wrote:
> Ok, so now you're running a replica set, yes?
>
> So your secondary device is stale and you have to resync from scratch.
> Please see this link
> -http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Se...
> - on how to do that.
>
> You can either wipe all the data files on the secondary and let it sync
> automatically or perform a 'fsync & lock' on the primary and copy the files
> over to the primary (this is documented here
> -http://docs.mongodb.org/manual/administration/backups/).
> ...
>
> read more »

markh

unread,
Jun 4, 2012, 4:57:17 AM6/4/12
to mongod...@googlegroups.com
No problem.  So with the 5gb oplog size, everything was ok for a week and then RS102 started happening again.

Typically, on 64-bit builds, the oplog is allocated 5% of disk space and this is generally a good setting.  There's no formula per say, however, if you're performing a lot of writes (inserts/deletes/updates) then you may want a larger oplog (than 5%) whereas if it's mostly reads, you could possibly get away with less than 5%.....it really depends on your app. Here's another intro link to sizing oplog, which may help explain things a little more - http://docs.mongodb.org/manual/core/replication/#replica-set-oplog-sizing.

What is the output of rs.status() from both the primary and secondary? 

How big is your disk by the way?

bichonfrise74

unread,
Jun 4, 2012, 11:51:44 AM6/4/12
to mongodb-user
The disk is 400 GB, but the database is only 40 GB, and 5% of 400GB is
20GB which seems quite big, right?
I notice that we get an RS102 when we do a mongorestore with a --drop
option which we do on a weekly basis. So, it looks like there is some
relationship between mongorestore + oplog size + RS102.

Here is the output of rs.status().

On the primary,

PRIMARY> rs.status()
{
"set" : "replication",
"date" : ISODate("2012-06-04T15:47:25Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "mongo1.com:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" : {
"t" : 1338729763000,
"i" : 1
},
"optimeDate" : ISODate("2012-06-03T13:22:43Z"),
"self" : true
},
{
"_id" : 1,
"name" : "mongo2.com:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 511154,
"optime" : {
"t" : 1338715100000,
"i" : 5808
},
"optimeDate" : ISODate("2012-06-03T09:18:20Z"),
"lastHeartbeat" : ISODate("2012-06-04T15:47:23Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
},
{
"_id" : 2,
"name" : "mongo3.com:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 506883,
"optime" : {
"t" : 1338715066000,
"i" : 4745
},
"optimeDate" : ISODate("2012-06-03T09:17:46Z"),
"lastHeartbeat" : ISODate("2012-06-04T15:47:23Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
}
],
"ok" : 1
}


On the secondary,

RECOVERING> rs.status()
{
"set" : "replication",
"date" : ISODate("2012-06-04T15:47:15Z"),
"myState" : 3,
"syncingTo" : "mongo1.com:27017",
"members" : [
{
"_id" : 0,
"name" : "mongo1.com:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 511133,
"optime" : {
"t" : 1338729763000,
"i" : 1
},
"optimeDate" : ISODate("2012-06-03T13:22:43Z"),
"lastHeartbeat" : ISODate("2012-06-04T15:47:13Z"),
"pingMs" : 0
},
{
"_id" : 1,
"name" : "mongo2.com:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"optime" : {
"t" : 1338715100000,
"i" : 5808
},
"optimeDate" : ISODate("2012-06-03T09:18:20Z"),
"self" : true
},
{
"_id" : 2,
"name" : "mongo3.com:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 506872,
"optime" : {
"t" : 1338715066000,
"i" : 4745
},
"optimeDate" : ISODate("2012-06-03T09:17:46Z"),
"lastHeartbeat" : ISODate("2012-06-04T15:47:13Z"),
"pingMs" : 12,
"errmsg" : "error RS102 too stale to catch up"
}
],
"ok" : 1
}

On Jun 4, 1:57 am, markh <m...@10gen.com> wrote:
> No problem.  So with the 5gb oplog size, everything was ok for a week and
> then RS102 started happening again.
>
> Typically, on 64-bit builds, the oplog is allocated 5% of disk space and
> this is generally a good setting.  There's no formula per say, however, if
> you're performing a lot of writes (inserts/deletes/updates) then you may
> want a larger oplog (than 5%) whereas if it's mostly reads, you could
> possibly get away with less than 5%.....it really depends on your app.
> Here's another intro link to sizing oplog, which may help explain things a
> little more
> -http://docs.mongodb.org/manual/core/replication/#replica-set-oplog-si....
> ...
>
> read more »

markh

unread,
Jun 4, 2012, 4:41:20 PM6/4/12
to mongod...@googlegroups.com
As you can see the rs.status() command shows that the data on the secondary is stale and too stale to catch up with the primary.

Why do you do a "mongorestore --drop" on a weekly basis? What seems to be happening is that the data you're dumping with mongodump and subsequently restoring with mongorestore is stale by the time you restore it. Therefore, I believe that you need to increase the oplog from 5gb. You can try increasing it only to 10gb and attempting the restore as you normally do. If the slave devices are not too far behind after the restore is finished then they can sync up with the master.

5% is the recommendation so in this case that'll be 20gb (and if the above doesn't work, that would be my recommendation) but as you know and previously mentioned, oplog sizes differ depending on the apps and the data. 

bichonfrise74

unread,
Jun 4, 2012, 5:16:39 PM6/4/12
to mongodb-user
Oh, oh! I take the mongodump on Fridays from production and do the
mongorestore on Sundary. Are you saying then that if I want to do a
mongodump/mongorestore, it should be almost real time? Eg. do the dump
now, then restore the data now as well? Or make sure that the oplog is
big enough to accommodate at least 3 days (Friday to Sunday) so that
the data will not go stale. I had the assumption that I can do
mongodump anytime and restore it anytime as well.

I do mongorestore --drop because I want to recreate the collections.
Is this not correct?
> ...
>
> read more »

markh

unread,
Jun 4, 2012, 5:39:47 PM6/4/12
to mongod...@googlegroups.com
Make sure that the oplog is big enough to accomodate 3 days of data in that case otherwise they're immediately stale.

bichonfrise74

unread,
Jun 18, 2012, 2:21:20 PM6/18/12
to mongod...@googlegroups.com
I apologize for reviving this but I am still having an RS102 issue. I tried to change my oplog size to 60 GB and 120 GB on both slaves and still got the error. Note that the primary still only has 5 GB of oplog which I think does not really matter on how big the oplog size of the primary is.

Anyway, any more suggestions or what I can do to help to fix the issue? Thanks.

Scott Hernandez

unread,
Jun 18, 2012, 2:46:30 PM6/18/12
to mongod...@googlegroups.com
The oplog in the primary is the *most important*, please increase it as well.
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

bichonfrise74

unread,
Jun 18, 2012, 4:31:30 PM6/18/12
to mongod...@googlegroups.com
I will also increase the oplog of the primary. Is this a good way to explain what is happening?

- write is happening to fast on the primary oplog
- the secondary is not able to catch up with respect to its read operation on the primary oplog
- at some point, the primary oplog is 'rotated' and the information is no longer there for the secondary to read and thus it gets the RS102 error

So, is it correct to say that the oplog of both primary and secondary should be the same?




On Monday, June 18, 2012 11:46:30 AM UTC-7, Scott Hernandez wrote:
The oplog in the primary is the *most important*, please increase it as well.

> mongodb-user+unsubscribe@googlegroups.com

Scott Hernandez

unread,
Jun 18, 2012, 4:36:20 PM6/18/12
to mongod...@googlegroups.com
That sounds about correct and the only oplog which matters is the
primary in this scenario.

In general it is best to have all replicas with the same oplog size
(which mean they hold the same window of time basically).
>> > mongodb-user...@googlegroups.com
>> > See also the IRC channel -- freenode.net#mongodb
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages