replica stuck in state 9 after repair

babak

unread,

Mar 15, 2011, 11:47:21 AM3/15/11

to mongodb-user

I'm trying out a rolling repair in a replica set with 3 members. After
repairing the master I check rs.status and got the following answer:

rs.status()
{
"set" : "foo",
"date" : ISODate("2011-03-15T15:39:27Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "localhost:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" : {
"t" : 1300186691000,
"i" : 5458
},
"optimeDate" : ISODate("2011-03-15T10:58:11Z"),
"self" : true
},
{
"_id" : 1,
"name" : "localhost:27018",
"health" : 1,
"state" : 9,
"stateStr" : "ROLLBACK",
"uptime" : 325,
"optime" : {
"t" : 1300203235000,
"i" : 3
},
"optimeDate" : ISODate("2011-03-15T15:33:55Z"),
"lastHeartbeat" : ISODate("2011-03-15T15:39:26Z"),
"errmsg" : "rollback 2 error findcommonpoint waiting a while before
trying again"
},
{
"_id" : 2,
"name" : "localhost:27019",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 329,
"optime" : {
"t" : 1300186691000,
"i" : 5458
},
"optimeDate" : ISODate("2011-03-15T10:58:11Z"),
"lastHeartbeat" : ISODate("2011-03-15T15:39:26Z")
}
],
"ok" : 1
}

localhost:27018 is the former master. Can't find anything about state
9 in the documentation and the replica seems to be stuck in that
state, any ideas why?
I'm running version 1.8.

Kyle Banker

unread,

Mar 15, 2011, 11:54:09 AM3/15/11

to mongod...@googlegroups.com

If you had to run a repair, then there's a good chance that some data
was lost. It looks like there's a problem with the oplog, where the
former master can't find a common point between its oplog and the
oplog of the current master.

If this error doesn't go away, then probably your only choice for
fixing this is a complete resync from the newest node. If that's
unacceptable, then we may want to take a closer looks at the
individual oplogs.

It looks like the node at 27019 has much newer data than the node at
27017. How much total data do you have? Is this a production
deployment?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

babak

unread,

Mar 15, 2011, 12:15:47 PM3/15/11

to mongodb-user

Thanks for your reply Kyle.

yes it's probably something with the oplogs. I asked the master to
step down and then shut it down, probably I shut it down too fast
before a new master was up which made the oplogs differ. is that a
good theory?
I checked the db files and saw that the former master that had state
9, had allocatet a 2gb file more than the others and the oplogs file
where only 0.5gb.
No I'm just testing rolling repair in development mode before rolling
it out in production.

Kyle Banker

unread,

Mar 15, 2011, 12:25:14 PM3/15/11

to mongod...@googlegroups.com

Yes, that's correct. If a write isn't replicated, then that write will
rolled back on a failover. But if you're talking about a situation
where you have to run repair, that's a more complicated story and is
probably the reason for the unresolved rollback. In general, you have
to be careful restoring unclean shutdowns.

Reply all

Reply to author

Forward