Automatically Mongo Primary moved to Secondary and Secondary moved to Primary

21 views
Skip to first unread message

Avise Sudhakar Rao

unread,
Jul 12, 2011, 2:48:27 AM7/12/11
to mongod...@googlegroups.com
Hi,

Don't know what might be the reason but automatically our Mongo Primary Server moved to Secondary and Secondary moved to Primary.
Can any one tell me what might be the reason?

General Information :-
I have three production servers with original configuration :
Primary - 172.25.122.181
Secondary - 172.25.122.182
Arbiter - 172.25.122.20

And below is the latest information
rs.status()
db.serverStatus()
free -lmt
mongostat --discovered

Note: 181 should be primary automatically moved to secondary and 182 to Primary, so therefore I stopped 182 and 181 become Primary and when I up 182 it became secondary and below information I took after 181 become primary.

monit:PRIMARY> rs.status()
{
        "set" : "monit",
        "date" : ISODate("2011-07-12T06:33:19Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "172.25.122.181:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "optime" : {
                                "t" : 1310452399000,
                                "i" : 27
                        },
                        "optimeDate" : ISODate("2011-07-12T06:33:19Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "172.25.122.182:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 1184,
                        "optime" : {
                                "t" : 1310452399000,
                                "i" : 22
                        },
                        "optimeDate" : ISODate("2011-07-12T06:33:19Z"),
                        "lastHeartbeat" : ISODate("2011-07-12T06:33:19Z")
                },
                {
                        "_id" : 2,
                        "name" : "172.25.122.20:27017",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 1812648,
                        "optime" : {
                                "t" : 0,
                                "i" : 0
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2011-07-12T06:33:18Z")
                }
        ],
        "ok" : 1
------------------------------------------------------------------------------------------------------------------------------------

monit:PRIMARY> db.serverStatus()
{
        "host" : "mdb51",
        "version" : "1.8.2",
        "process" : "mongod",
        "uptime" : 1812703,
        "uptimeEstimate" : 1637605,
        "localTime" : ISODate("2011-07-12T06:34:13.954Z"),
        "globalLock" : {
                "totalTime" : 1812703102775,
                "lockTime" : 320393747383,
                "ratio" : 0.17674915814538028,
                "currentQueue" : {
                        "total" : 0,
                        "readers" : 0,
                        "writers" : 0
                },
                "activeClients" : {
                        "total" : 1,
                        "readers" : 1,
                        "writers" : 0
                }
        },
        "mem" : {
                "bits" : 64,
                "resident" : 6204,
                "virtual" : 33422,
                "supported" : true,
                "mapped" : 10476
        },
        "connections" : {
                "current" : 1195,
                "available" : 14805
        },
        "extra_info" : {
                "note" : "fields vary by platform",
                "heap_usage_bytes" : 5119888,
                "page_faults" : 600540
        },
        "indexCounters" : {
                "btree" : {
                        "accesses" : 32287,
                        "hits" : 32283,
                        "misses" : 4,
                        "resets" : 0,
                        "missRatio" : 0.00012388887168210115
                }
        },
        "backgroundFlushing" : {
                "flushes" : 30208,
                "total_ms" : 69500836,
                "average_ms" : 2300.742717161017,
                "last_ms" : 1068,
                "last_finished" : ISODate("2011-07-12T06:33:38.881Z")
        },
        "cursors" : {
                "totalOpen" : 1,
                "clientCursors_size" : 1,
                "timedOut" : 24
        },
        "network" : {
                "bytesIn" : NumberLong("2480964464628"),
                "bytesOut" : NumberLong("10175864533493"),
                "numRequests" : 573022072
        },
        "repl" : {
                "setName" : "monit",
                "ismaster" : true,
                "secondary" : false,
                "hosts" : [
                        "172.25.122.181:27017",
                        "172.25.122.182:27017"
                ],
                "arbiters" : [
                        "172.25.122.20:27017"
                ]
        },
        "opcounters" : {
                "insert" : 93375,
                "query" : 354365104,
                "update" : 124710081,
                "delete" : 35050,
                "getmore" : 89418478,
                "command" : 4399952
        },
        "asserts" : {
                "regular" : 0,
                "warning" : 7,
                "msg" : 0,
                "user" : 1940076,
                "rollovers" : 0
        },
        "writeBacksQueued" : false,
        "dur" : {
                "commits" : 23,
                "journaledMB" : 30.5152,
                "writeToDataFilesMB" : 0,
                "commitsInWriteLock" : 0,
                "earlyCommits" : 0,
                "timeMs" : {
                        "dt" : 3014,
                        "prepLogBuffer" : 134,
                        "writeToJournal" : 481,
                        "writeToDataFiles" : 46,
                        "remapPrivateView" : 4
                }
        },
        "ok" : 1
}
monit:PRIMARY>

-----------------------------------------------

root@mdb51:/home/neadmin# free -lmt
             total       used       free     shared    buffers     cached
Mem:          7908       7635        173          0        135       7059
Low:          7908       7635        273
High:            0          0          0
-/+ buffers/cache:        540       7368
Swap:         1447          9       1438
Total:        9356       7644       1712

------------------------------------------------
mongostat --discovered

                        insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn   set repl       time
172.25.122.181:27017         0     75     27      0      20       8       0  10.2g  32.6g  6.05g      0      0.5          0       0|0     1|0   318k     1m  1197 monit    M   06:36:16
172.25.122.182:27017        *0     *0    *27     *0       0     5|0       0  10.2g  20.8g  5.31g      0      0.3          0       0|0     0|0   294b     2k    10 monit  SEC   06:36:16
     localhost:27017         0     75     27      0      20       8       0  10.2g  32.6g  6.05g      0      0.5          0       0|0     1|0   318k     1m  1197 monit    M   06:36:16

172.25.122.181:27017         0     20      8      0       5       3       0  10.2g  32.6g  6.05g      0      0.1          0       0|0     1|0    81k   316k  1197 monit    M   06:36:17
172.25.122.182:27017        *0     *0     *8     *0       0     3|0       0  10.2g  20.8g  5.31g      0      0.1          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:17
     localhost:27017         0     20      8      0       5       3       0  10.2g  32.6g  6.05g      0      0.1          0       0|0     1|0    81k   316k  1197 monit    M   06:36:17

172.25.122.181:27017         0     55     21      0      14       3       0  10.2g  32.6g  6.05g      0      0.3          0       0|0     1|0   260k     1m  1197 monit    M   06:36:18
172.25.122.182:27017        *2     *0    *22     *0       0     3|0       0  10.2g  20.8g  5.31g      1        2          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:18
     localhost:27017         0     55     21      0      14       3       0  10.2g  32.6g  6.05g      0      0.3          0       0|0     1|0   260k     1m  1197 monit    M   06:36:18

172.25.122.181:27017         2     72     23      0      17       3       0  10.2g  32.6g  6.05g      1      5.9          0       2|1     2|1   301k     1m  1197 monit    M   06:36:19
172.25.122.182:27017        *1     *0    *74     *0       0     6|0       0  10.2g  20.8g  5.31g      0      1.1          0       0|0     0|0   352b     2k    10 monit  SEC   06:36:19
     localhost:27017         2     72     23      0      17       3       0  10.2g  32.6g  6.05g      1      5.9          0       2|1     2|1   301k     1m  1197 monit    M   06:36:19

172.25.122.181:27017         1    202     75      0      61      10       0  10.2g  32.6g  6.05g      1      7.9          0       0|0     1|0     1m     3m  1197 monit    M   06:36:20
172.25.122.182:27017        *0     *0    *20     *0       0     3|0       0  10.2g  20.8g  5.31g      0      0.4          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:20
     localhost:27017         1    202     75      0      61       9       0  10.2g  32.6g  6.05g      1      7.9          0       0|0     1|0     1m     3m  1197 monit    M   06:36:20

                        insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn   set repl       time
172.25.122.181:27017         0     78     30      0      24       3       0  10.2g  32.6g  6.05g      0      0.6          0       0|0     1|0   347k     1m  1197 monit    M   06:36:21
172.25.122.182:27017        *0     *0    *43     *0       0     1|0       0  10.2g  20.8g  5.31g      0      0.6          0       0|0     0|0    62b     1k    10 monit  SEC   06:36:21
     localhost:27017         0     78     30      0      24       3       0  10.2g  32.6g  6.05g      0      0.6          0       0|0     1|0   347k     1m  1197 monit    M   06:36:21

172.25.122.181:27017         0     89     34      0      26       3       0  10.2g  32.6g  6.05g      0      0.6          0       0|0     1|0   414k     1m  1197 monit    M   06:36:22
172.25.122.182:27017        *0     *0     *5     *0       0     3|0       0  10.2g  20.8g  5.31g      0      0.1          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:22
     localhost:27017         0     89     34      0      26       3       0  10.2g  32.6g  6.05g      0      0.6          0       0|0     1|0   414k     1m  1197 monit    M   06:36:22

172.25.122.181:27017         0     27      9      0       6       3       0  10.2g  32.6g  6.05g      0      0.1          0       0|0     1|0   101k   452k  1197 monit    M   06:36:23
172.25.122.182:27017        *0     *0    *94     *0       0     1|0       0  10.2g  20.8g  5.31g      0      1.3          0       0|0     0|0    62b     1k    10 monit  SEC   06:36:23
     localhost:27017         0     27      9      0       6       3       0  10.2g  32.6g  6.05g      0      0.1          0       0|0     1|0   101k   452k  1197 monit    M   06:36:23

172.25.122.181:27017         0    241     92      0      76       2       0  10.2g  32.6g  6.05g      1      3.4          0       0|0     1|0     1m     4m  1197 monit    M   06:36:24
172.25.122.182:27017        *0     *0    *20     *0       0     7|0       0  10.2g  20.8g  5.31g      0      0.4          0       0|0     0|0   555b     2k    10 monit  SEC   06:36:24
     localhost:27017         0    241     92      0      76       4       0  10.2g  32.6g  6.05g      1      3.3          0       0|0     1|0     1m     4m  1197 monit    M   06:36:24

172.25.122.181:27017         0     59     20      0       8      11       0  10.2g  32.6g  6.05g      0      3.9          0       0|0     0|0   241k   944k  1197 monit    M   06:36:25
172.25.122.182:27017        *0     *0    *18     *0       0     2|0       0  10.2g  20.8g  5.31g      0      0.2          0       0|0     0|0   120b     1k    10 monit  SEC   06:36:25
     localhost:27017         0     59     20      0       8      10       0  10.2g  32.6g  6.05g      0      3.8          0       0|0     0|0   241k   943k  1197 monit    M   06:36:25

                        insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn   set repl       time
172.25.122.181:27017         0     47     18      0      14       3       0  10.2g  32.6g  6.05g      0      0.3          0       0|0     1|0   223k   876k  1197 monit    M   06:36:26
172.25.122.182:27017        *0     *0    *25     *0       0     3|0       0  10.2g  20.8g  5.31g      0      0.4          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:26
     localhost:27017         0     47     18      0      14       3       0  10.2g  32.6g  6.05g      0      0.3          0       0|0     1|0   223k   876k  1197 monit    M   06:36:26

172.25.122.181:27017         0     52     20      0      16       3       0  10.2g  32.6g  6.05g      0      0.3          0       0|0     1|0   241k   942k  1197 monit    M   06:36:27
172.25.122.182:27017        *0     *0    *28     *0       0     1|0       0  10.2g  20.8g  5.31g      0      0.4          0       0|0     0|0    62b     1k    10 monit  SEC   06:36:27
     localhost:27017         0     52     20      0      16       3       0  10.2g  32.6g  6.05g      0      0.3          0       0|0     1|0   241k   942k  1197 monit    M   06:36:27

172.25.122.181:27017         0     75     28      0      23       3       0  10.2g  32.6g  6.05g      0      0.5          0       0|0     1|0   360k     1m  1197 monit    M   06:36:28
172.25.122.182:27017        *0     *0    *74     *0       0     3|0       0  10.2g  20.8g  5.31g      0        1          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:28
     localhost:27017         0     75     28      0      23       3       0  10.2g  32.6g  6.05g      0      0.5          0       0|0     1|0   360k     1m  1197 monit    M   06:36:28

172.25.122.181:27017         2    200     77      0      57       3       0  10.2g  32.6g  6.05g      1      2.4          0       0|0     1|0   866k     3m  1197 monit    M   06:36:29
172.25.122.182:27017        *2     *0    *40     *0       0     5|0       0  10.2g  20.8g  5.31g      0      0.6          0       0|0     0|0   294b     2k    10 monit  SEC   06:36:29
     localhost:27017         2    200     77      0      57       3       0  10.2g  32.6g  6.05g      1      2.4          0       0|0     1|0   866k     3m  1197 monit    M   06:36:29

172.25.122.181:27017         0    133     48      0      38      10       0  10.2g  32.6g  6.05g      0      0.9          0       0|0     1|0   602k     2m  1197 monit    M   06:36:30
172.25.122.182:27017        *0     *0    *44     *0       0     4|0       0  10.2g  20.8g  5.31g      0      0.7          0       0|0     0|0   381b     2k    10 monit  SEC   06:36:30
     localhost:27017         0    133     48      0      38      10       0  10.2g  32.6g  6.05g      0      0.9          0       0|0     1|0   602k     2m  1197 monit    M   06:36:30

                        insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn   set repl       time
172.25.122.181:27017         0     93     36      0      26       3       0  10.2g  32.6g  6.05g      0      0.6          0       0|0     1|0   428k     1m  1197 monit    M   06:36:31
172.25.122.182:27017        *0     *0    *37     *0       0     1|0       0  10.2g  20.8g  5.31g      0      0.7          0       0|0     0|0    62b     1k    10 monit  SEC   06:36:31
     localhost:27017         0     93     36      0      26       3       0  10.2g  32.6g  6.05g      0      0.6          0       0|0     1|0   428k     1m  1197 monit    M   06:36:31

172.25.122.181:27017         0    101     39      0      32       3       0  10.2g  32.6g  6.05g      0      0.7          0       0|0     1|0   441k     1m  1197 monit    M   06:36:32
172.25.122.182:27017        *0     *0    *37     *0       0     3|0       0  10.2g  20.8g  5.31g      0      0.5          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:32
     localhost:27017         0    101     39      0      32       3       0  10.2g  32.6g  6.05g      0      0.7          0       0|0     1|0   441k     1m  1197 monit    M   06:36:32

172.25.122.181:27017         0     87     32      0      27       3       0  10.2g  32.6g  6.05g      1      0.5          0       0|0     1|0   462k     1m  1197 monit    M   06:36:33
172.25.122.182:27017        *0     *0    *46     *0       0     1|0       0  10.2g  20.8g  5.32g      0      0.5          0       0|0     0|0    62b     1k    10 monit  SEC   06:36:33
     localhost:27017         0     87     32      0      27       3       0  10.2g  32.6g  6.05g      1      0.5          0       0|0     1|0   462k     1m  1197 monit    M   06:36:33

172.25.122.181:27017         0    122     48      0      36       3       0  10.2g  32.6g  6.05g      0      0.8          0       0|0     1|0   524k     2m  1197 monit    M   06:36:34
172.25.122.182:27017        *0     *0     *8     *0       0     8|0       0  10.2g  20.8g  5.32g      0      0.1          0       0|0     0|0   613b     3k    10 monit  SEC   06:36:34
     localhost:27017         0    122     48      0      36       3       0  10.2g  32.6g  6.05g      0      0.8          0       0|0     1|0   524k     2m  1197 monit    M   06:36:34

172.25.122.181:27017         0     270      6      0       5      10       0  10.2g  32.6g  6.05g      0      0.1          0       0|0     1|0    81k   377k  1197 monit    M   06:36:35
172.25.122.182:27017        *0     *0    *23     *0       0     1|0       1  10.2g  20.8g  5.32g      0      0.4          0       0|0     0|0    62b     1k    10 monit  SEC   06:36:35
     localhost:27017         0     27      6      0       5      10       0  10.2g  32.6g  6.05g      0      0.1          0       0|0     1|0    81k   376k  1197 monit    M   06:36:35

                        insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn   set repl       time
172.25.122.181:27017         0     630     26      0      17       3       0  10.2g  32.6g  6.05g      0      0.5          0       0|0     1|0   301k     1m  1197 monit    M   06:36:36
172.25.122.182:27017        *0     *0    *21     *0       0     3|0       0  10.2g  20.8g  5.32g      0      0.3          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:36
     localhost:27017         0     63     26      0      17       3       0  10.2g  32.6g  6.05g      0      0.5          0       0|0     1|0   301k     1m  1197 monit    M   06:36:36

172.25.122.181:27017         0     550     21      0      17       3       0  10.2g  32.6g  6.05g      0      0.3          0       0|0     1|0   247k   963k  1197 monit    M   06:36:37
172.25.122.182:27017        *0     *0    *42     *0       0     1|0       0  10.2g  20.8g  5.32g      0      0.6          0       0|0     0|0    62b     1k    10 monit  SEC   06:36:37
     localhost:27017         0     55     21      0      17       3       0  10.2g  32.6g  6.05g      0      0.3          0       0|0     1|0   247k   965k  1197 monit    M   06:36:37

172.25.122.181:27017         0    1003     39      0      29       3       0  10.2g  32.6g  6.05g      0      0.7          0       0|0     1|0   503k     1m  1197 monit    M   06:36:38
172.25.122.182:27017        *0     *0    *36     *0       0     3|0       0  10.2g  20.8g  5.32g      0      0.6          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:38
     localhost:27017         0    103     39      0      29       3       0  10.2g  32.6g  6.05g      0      0.7          0       0|0     1|0   503k     1m  1197 monit    M   06:36:38

172.25.122.181:27017         0    1002     38      0      30       3       0  10.2g  32.6g  6.05g      1      5.6          0       0|0     1|0   490k     1m  1197 monit    M   06:36:39
172.25.122.182:27017        *0     *0    *36     *0       0     3|0       0  10.2g  20.8g  5.32g      0      0.6          0       0|0     0|0   323b     1k    10 monit  SEC   06:36:39
     localhost:27017         0    102     38      0      30       3       0  10.2g  32.6g  6.05g      1      5.6          0       0|0     1|0   490k     1m  1197 monit    M   06:36:39

172.25.122.181:27017         0    122     48      0      36       3       0  10.2g  32.6g  6.05g      0      0.8          0       0|0     1|0   524k     2m  1197 monit    M   06:36:34
172.25.122.182:27017        *0     *0     *8     *0       0     8|0       0  10.2g  20.8g  5.32g      0      0.1          0       0|0     0|0   613b     3k    10 monit  SEC   06:36:34
     localhost:27017         0    122     48      0      36       3       0  10.2g  32.6g  6.05g      0      0.8          0       0|0     1|0   524k     2m  1197 monit    M   06:36:34

172.25.122.181:27017         0     270      6      0       5      10       0  10.2g  32.6g  6.05g      0      0.1          0       0|0     1|0    81k   377k  1197 monit    M   06:36:35
172.25.122.182:27017        *0     *0    *23     *0       0     1|0       1  10.2g  20.8g  5.32g      0      0.4          0       0|0     0|0    62b     1k    10 monit  SEC   06:36:35
     localhost:27017         0     27      6      0       5      10       0  10.2g  32.6g  6.05g      0      0.1          0       0|0     1|0    81k   376k  1197 monit    M   06:36:35


Avise Sudhakar Rao

unread,
Jul 12, 2011, 3:56:39 AM7/12/11
to mongod...@googlegroups.com
HI,

Can I get any feedback/Suggestions, to overcome this problem.

I'm sure this problem may happen again and want to take precautions before anything happens.

Thanks,
Sudhakar

Grégoire Seux

unread,
Jul 12, 2011, 4:16:23 AM7/12/11
to mongod...@googlegroups.com
Hi,

elections are usually triggered when the primary is not seen anymore by the secondaries.

It doesnt mean automatically that your primary is down, only that is does not respond for a given time span.

Gregoire

Avise Sudhakar Rao

unread,
Jul 12, 2011, 4:45:35 AM7/12/11
to mongod...@googlegroups.com
Yes I saw this article but known of the network issue was taken place, neither internet was down, nor there is any DNS issue and nor primary was shutdown.

I'm just curious to know if there is any problem with Memory (db.serverStatus().mem(data given above))  or because of RAM (free -lmt (data given above)) or etc...

Just for the information:- Mongod process using 100-150% CPU (2 CPU's) and 70%-90% Memory

Please let me know your feedback.

Thanks,
Sudhakar

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/8WA-mWpLJUcJ.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Alvin Richards

unread,
Jul 12, 2011, 6:59:35 AM7/12/11
to mongodb-user
If you check the logs of the primary and secondary there should be an
indication of the events proceeding the election (e.g. dropped
connection etc)

-alvin

On Jul 12, 9:45 am, Avise Sudhakar Rao <avisesudhakar...@gmail.com>
wrote:
> Yes I saw this article but known of the network issue was taken place,
> neither internet was down, nor there is any DNS issue and nor primary was
> shutdown.
>
> I'm just curious to know if there is any problem with Memory
> (db.serverStatus().mem(data given above))  or because of RAM (free -lmt
> (data given above)) or etc...
>
> Just for the information:- Mongod process using 100-150% CPU (2 CPU's) and
> 70%-90% Memory
>
> Please let me know your feedback.
>
> Thanks,
> Sudhakar
>
> On Tue, Jul 12, 2011 at 1:46 PM, Grégoire Seux <kamaradclim...@gmail.com>wrote:
>
>
>
> > Hi,
>
> > elections are usually triggered when the primary is not seen anymore by the
> > secondaries.
>
> >http://www.mongodb.org/display/DOCS/Replica+Sets+-+Voting#ReplicaSets...

Scott Hernandez

unread,
Jul 12, 2011, 6:59:47 AM7/12/11
to mongod...@googlegroups.com
Is there some reason you want one to be primary, specifically?

You can look in the server logs to see if there is anything there to
indicate why/when the election was held.

That sounds like a lot of cpu usage. In most cases that is a
side-effect of not having the correct indexes. Are you doing any
operations which would be cpu intensive?

Reply all
Reply to author
Forward
0 new messages