The disk is 400 GB, but the database is only 40 GB, and 5% of 400GB is
20GB which seems quite big, right?
I notice that we get an RS102 when we do a mongorestore with a --drop
option which we do on a weekly basis. So, it looks like there is some
relationship between mongorestore + oplog size + RS102.
Here is the output of rs.status().
On the primary,
PRIMARY> rs.status()
{
"set" : "replication",
"date" : ISODate("2012-06-04T15:47:25Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "
mongo1.com:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" : {
"t" : 1338729763000,
"i" : 1
},
"optimeDate" : ISODate("2012-06-03T13:22:43Z"),
"self" : true
},
{
"_id" : 1,
"name" : "
mongo2.com:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 511154,
"optime" : {
"t" : 1338715100000,
"i" : 5808
},
"optimeDate" : ISODate("2012-06-03T09:18:20Z"),
"lastHeartbeat" : ISODate("2012-06-04T15:47:23Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
},
{
"_id" : 2,
"name" : "
mongo3.com:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 506883,
"optime" : {
"t" : 1338715066000,
"i" : 4745
},
"optimeDate" : ISODate("2012-06-03T09:17:46Z"),
"lastHeartbeat" : ISODate("2012-06-04T15:47:23Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
}
],
"ok" : 1
}
On the secondary,
RECOVERING> rs.status()
{
"set" : "replication",
"date" : ISODate("2012-06-04T15:47:15Z"),
"myState" : 3,
"syncingTo" : "
mongo1.com:27017",
"members" : [
{
"_id" : 0,
"name" : "
mongo1.com:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 511133,
"optime" : {
"t" : 1338729763000,
"i" : 1
},
"optimeDate" : ISODate("2012-06-03T13:22:43Z"),
"lastHeartbeat" : ISODate("2012-06-04T15:47:13Z"),
"pingMs" : 0
},
{
"_id" : 1,
"name" : "
mongo2.com:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"optime" : {
"t" : 1338715100000,
"i" : 5808
},
"optimeDate" : ISODate("2012-06-03T09:18:20Z"),
"self" : true
},
{
"_id" : 2,
"name" : "
mongo3.com:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 506872,
"optime" : {
"t" : 1338715066000,
"i" : 4745
},
"optimeDate" : ISODate("2012-06-03T09:17:46Z"),
"lastHeartbeat" : ISODate("2012-06-04T15:47:13Z"),
"pingMs" : 12,
"errmsg" : "error RS102 too stale to catch up"
}
],
"ok" : 1
}
On Jun 4, 1:57 am, markh <
m...@10gen.com> wrote:
> No problem. So with the 5gb oplog size, everything was ok for a week and
> then RS102 started happening again.
>
> Typically, on 64-bit builds, the oplog is allocated 5% of disk space and
> this is generally a good setting. There's no formula per say, however, if
> you're performing a lot of writes (inserts/deletes/updates) then you may
> want a larger oplog (than 5%) whereas if it's mostly reads, you could
> possibly get away with less than 5%.....it really depends on your app.
> Here's another intro link to sizing oplog, which may help explain things a
> little more
> -
http://docs.mongodb.org/manual/core/replication/#replica-set-oplog-si....
> ...
>
> read more »