replica set in ROLLBACK state

207 views
Skip to first unread message

sarvesh

unread,
Sep 7, 2012, 8:25:29 PM9/7/12
to mongod...@googlegroups.com
I see the following 


  {
                        "_id" : 0,
                        "name" : "vm-sv5-mongo02.1:27017",
                        "health" : 1,
                        "state" : 9,
                        "stateStr" : "ROLLBACK",
                        "uptime" : 752,
                        "optime" : {
                                "t" : 1347048022000,
                                "i" : 28
                        },
                        "optimeDate" : ISODate("2012-09-07T20:00:22Z"),
                        "lastHeartbeat" : ISODate("2012-09-08T00:23:31Z"),
                        "pingMs" : 0,
                        "errmsg" : "replSet rollback 3 fixup"

}

This is a replica node. 
Would it automatically recover from the ROLLBACK State. What does the err msg imply?

From the logs.

[rsBackgroundSync] replSet our last op time fetched: Sep  7 13:00:22:1c
Fri Sep  7 17:24:45 [rsBackgroundSync] replset source's GTE: Sep  7 13:30:36:1
Fri Sep  7 17:24:45 [rsBackgroundSync] replSet rollback 0
Fri Sep  7 17:24:45 [rsBackgroundSync] replSet ROLLBACK
Fri Sep  7 17:24:45 [rsBackgroundSync] replSet rollback 1
Fri Sep  7 17:24:45 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Fri Sep  7 17:24:45 [rsBackgroundSync] replSet info rollback our last optime:   Sep  7 13:00:22:1c
Fri Sep  7 17:24:45 [rsBackgroundSync] replSet info rollback their last optime: Sep  7 17:15:24:25
Fri Sep  7 17:24:45 [rsBackgroundSync] replSet info rollback diff in end of log times: -15302 seconds
Fri Sep  7 17:24:45 [conn63] run command admin.$cmd { replSetHeartbeat: "vm-sv5-repl02", v: 3, pv: 1, checkEmpty: false, from: "vm-sv5-mongo04.1:27017" }
Fri Sep  7 17:24:45 [conn63] command admin.$cmd command: { replSetHeartbeat: "vm-sv5-repl02", v: 3, pv: 1




sarvesh

unread,
Sep 7, 2012, 8:36:52 PM9/7/12
to mongod...@googlegroups.com
db.printReplicationInfo() 

on that machne

configured oplog size:   23380.0330078125MB
log length start to end: 7696273secs (2137.85hrs)
oplog first event time:  Sun Jun 10 2012 11:09:09 GMT-0700 (PDT)
oplog last event time:   Fri Sep 07 2012 13:00:22 GMT-0700 (PDT)
now:                     Fri Sep 07 2012 17:35:57 GMT-0700 (PDT)

Stephen Steneker

unread,
Sep 8, 2012, 9:22:55 AM9/8/12
to mongod...@googlegroups.com
Would it automatically recover from the ROLLBACK State. What does the err msg imply?

Hi,

The ROLLBACK state indicates that some oplog entries on that replica set member are newer than the current primary, so those operations are being reverted to a common point so replication can resume.  The node should automatically recover as a secondary, however you will need to check the data that was rolled back and re-insert if applicable.

.. and this blog post on simulating rollback (also includes some diagrams):
  http://comerford.cc/wordpress/2012/05/28/simulating-rollback-on-mongodb/


In your specific case, the rollback period was 15302 seconds (or about 4.25 hours):

Fri Sep  7 17:24:45 [rsBackgroundSync] replSet info rollback our last optime:   Sep  7 13:00:22:1c
Fri Sep  7 17:24:45 [rsBackgroundSync] replSet info rollback their last optime: Sep  7 17:15:24:25
Fri Sep  7 17:24:45 [rsBackgroundSync] replSet info rollback diff in end of log times: -15302 seconds 

The rollback process goes through several internal steps depending on what is required.  The current step is noted in the 'errmsg' while in ROLLBACK state, eg:

             "errmsg" : "replSet rollback 3 fixup"

When the rollback process finishes you will see the log message "rollback done" and the state should change to SECONDARY.
 
Cheers,
Stephen 
Reply all
Reply to author
Forward
0 new messages