Creazione indice e down su Replica Sets

34 views
Skip to first unread message

Giacomo Benvenuti

unread,
Jul 3, 2013, 9:55:22 AM7/3/13
to mongo...@googlegroups.com
Salve, registro un comportamento che non riesco a capire e che mi da qualche noia.

Ho 3 server in Replica Sets:

Eseguo un
> db.test.Collection.ensureIndex({date:-1})
su una collection di 6M di documenti.

L'indice viene creato ma leggo da log che per qualche secondo i nodi non si sono visti ed e' scattato il meccanismo di elezione di un nuovo master.
Alla fine di tutto master.host.it che ha priorita' piu' alta ritorna master.
Questo comportamento pero' mi fa pensare a qualche problematica che non considero.
Quale e' la procedura corretta per creare un indice su collection di milioni di elementi in un Replica Set ?

Grazie mille per il supporto.

Allego sotto il log del master e di uno degli slave

Giacomo



Wed Jul  3 15:20:18.755 [conn475488] build index DBTest.test.Collection { date: -1.0 }
Wed Jul  3 15:20:28.852 [conn475488] Index: (1/3) External Sort Progress: 4000000/6719789 59%
Wed Jul  3 15:20:35.307 [conn475488] external sort used : 7 files  in 16 secs
Wed Jul  3 15:20:45.000 [conn475488] Index: (2/3) BTree Bottom Up Progress: 4871000/6719789 72%
Wed Jul  3 15:20:48.093 [conn475488] done building bottom layer, going to commit
Wed Jul  3 15:20:49.178 [conn475488] build index done.  scanned 6719789 total records. 30.423 secs
Wed Jul  3 15:20:49.179 [conn475488] insert DBTest.system.indexes ninserted:1 keyUpdates:0 locks(micros) w:30424231 30424ms
Wed Jul  3 15:20:49.179 [conn475463] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 29518, after repl: 29518, at end: 29518 }
Wed Jul  3 15:20:49.179 [conn475463] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:58 reslen:4305 29910ms
Wed Jul  3 15:20:49.181 [conn475488] command denied: { replSetGetStatus: 1.0, forShell: 1.0 }
Wed Jul  3 15:21:13.857 [rsHealthPoll] DBClientCursor::init call() failed
Wed Jul  3 15:21:13.858 [rsHealthPoll] replSet info slave2.host.it:8090 is down (or slow to respond): DBClientBase::findN: transport error: slave2.host.it:8090 ns: local.$cmd query: { authenticate: 1, nonce: "b2803984f98d75f3", user: "__system", key: "6c542fea3417906b302d5b145e581d55" }
Wed Jul  3 15:21:13.858 [rsHealthPoll] replSet member slave2.host.it:8090 is now in state DOWN
Wed Jul  3 15:21:14.908 [rsHealthPoll] DBClientCursor::init call() failed
Wed Jul  3 15:21:14.908 [rsHealthPoll] replSet info slave1.host.it:8090 is down (or slow to respond): DBClientBase::findN: transport error: slave1.host.it:8090 ns: local.$cmd query: { authenticate: 1, nonce: "be6e4e067976acc5", user: "__system", key: "cddf4ed257c141d4ea09394e4fca5ec0" }
Wed Jul  3 15:21:14.908 [rsHealthPoll] replSet member slave1.host.it:8090 is now in state DOWN
Wed Jul  3 15:21:14.908 [rsMgr] can't see a majority of the set, relinquishing primary
Wed Jul  3 15:21:14.908 [rsMgr] replSet relinquishing primary state
Wed Jul  3 15:21:14.908 [rsMgr] replSet SECONDARY
Wed Jul  3 15:21:14.908 [rsMgr] replSet closing client sockets after relinquishing primary
Wed Jul  3 15:21:15.994 [conn475464] SocketException handling request, closing client connection: 9001 socket exception [2] server [IP:46077] 
Wed Jul  3 15:21:15.994 [conn475465] SocketException handling request, closing client connection: 9001 socket exception [2] server [IP:34250] 
Wed Jul  3 15:21:19.376 [rsHealthPoll] replSet member slave2.host.it:8090 is up
Wed Jul  3 15:21:19.376 [rsHealthPoll] replSet member slave2.host.it:8090 is now in state SECONDARY
Wed Jul  3 15:21:19.376 [rsMgr] not electing self, slave2.host.it:8090 would veto with 'I don't think master.host.it:8090 is electable'
Wed Jul  3 15:21:22.258 [rsHealthPoll] replSet member slave1.host.it:8090 is up
Wed Jul  3 15:21:22.258 [rsHealthPoll] replSet member slave1.host.it:8090 is now in state SECONDARY
Wed Jul  3 15:21:22.557 [rsMgr] replSet info electSelf 1
Wed Jul  3 15:21:22.779 [rsMgr] replSet PRIMARY


Wed Jul  3 15:20:49.180 [repl writer worker 2] build index DBTest.test.Collection { date: -1.0 }
Wed Jul  3 15:21:00.020 [repl writer worker 2] Index: (1/3) External Sort Progress: 4000000/6719789 59%
Wed Jul  3 15:21:06.835 [repl writer worker 2] external sort used : 7 files  in 17 secs
Wed Jul  3 15:21:13.865 [rsHealthPoll] DBClientCursor::init call() failed
Wed Jul  3 15:21:13.865 [rsHealthPoll] replSet info slave2.host.it:8090 is down (or slow to respond): DBClientBase::findN: transport error: slave2.host.it:8090 ns: local.$cmd query: { authenticate: 1, nonce: "3cc89674f89a6120", user: "__system", key: "9d2069f0e7e77a4737446c3dd8a87677" }
Wed Jul  3 15:21:13.865 [rsHealthPoll] replSet member slave2.host.it:8090 is now in state DOWN
Wed Jul  3 15:21:14.908 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: master.host.it:8090
Wed Jul  3 15:21:16.000 [repl writer worker 2] Index: (2/3) BTree Bottom Up Progress: 3665900/6719789 54%
Wed Jul  3 15:21:16.434 [rsHealthPoll] replset info master.host.it:8090 thinks that we are down
Wed Jul  3 15:21:16.434 [rsHealthPoll] replSet member master.host.it:8090 is now in state SECONDARY
Wed Jul  3 15:21:18.436 [rsHealthPoll] replset info master.host.it:8090 thinks that we are down
Wed Jul  3 15:21:19.376 [rsHealthPoll] replset info slave2.host.it:8090 thinks that we are down
Wed Jul  3 15:21:19.376 [rsHealthPoll] replSet member slave2.host.it:8090 is up
Wed Jul  3 15:21:19.376 [rsHealthPoll] replSet member slave2.host.it:8090 is now in state SECONDARY
Wed Jul  3 15:21:20.437 [rsHealthPoll] replset info master.host.it:8090 thinks that we are down
Wed Jul  3 15:21:21.115 [repl writer worker 2] done building bottom layer, going to commit
Wed Jul  3 15:21:21.376 [rsHealthPoll] replset info slave2.host.it:8090 thinks that we are down
Wed Jul  3 15:21:22.257 [repl writer worker 2] build index done.  scanned 6719789 total records. 33.076 secs
Wed Jul  3 15:21:22.258 [conn133556] command local.$cmd command: { authenticate: 1, nonce: "d9a62e7b61be1fe7", user: "__system", key: "08f313036a8fbfbf55ab59b2d090e849" } ntoreturn:1 keyUpdates:0 locks(micros) r:178 reslen:74 5346ms
Wed Jul  3 15:21:22.258 [conn133547] command local.$cmd command: { authenticate: 1, nonce: "be6e4e067976acc5", user: "__system", key: "cddf4ed257c141d4ea09394e4fca5ec0" } ntoreturn:1 keyUpdates:0 locks(micros) r:177 reslen:74 17350ms
Wed Jul  3 15:21:22.258 [conn133546] command local.$cmd command: { authenticate: 1, nonce: "64e3be73b8f0881f", user: "__system", key: "91afcabbf435d88439f585bd44257bab" } ntoreturn:1 keyUpdates:0 locks(micros) r:206 reslen:74 17350ms
Wed Jul  3 15:21:22.258 [conn133557] command local.$cmd command: { authenticate: 1, nonce: "9a2373f52fe6724e", user: "__system", key: "a93a90bc79f292109f0f6fe4e7947b31" } ntoreturn:1 keyUpdates:0 locks(micros) r:185 reslen:74 5346ms
Wed Jul  3 15:21:22.258 [conn133517] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 24622, after repl: 24622, at end: 24622 }
Wed Jul  3 15:21:22.258 [conn133517] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:46 reslen:3992 25393ms
Wed Jul  3 15:21:22.557 [conn133557] replSet info voting yea for master.host.it:8090 (1)
Wed Jul  3 15:21:22.619 [rsSyncNotifier] Socket recv() errno:104 Connection reset by peer IP:8090
Wed Jul  3 15:21:22.619 [rsSyncNotifier] SocketException: remote: IP:8090 error: 9001 socket exception [1] server [IP:8090] 
Wed Jul  3 15:21:22.619 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: master.host.it:8090
Wed Jul  3 15:21:23.619 [rsSyncNotifier] replset setting oplog notifier to master.host.it:8090
Wed Jul  3 15:21:23.619 [rsSyncNotifier] Socket flush send() errno:32 Broken pipe IP:8090
Wed Jul  3 15:21:23.619 [rsSyncNotifier]   caught exception (socket exception [SEND_ERROR] for IP:8090) in destructor (~PiggyBackData)
Wed Jul  3 15:21:24.438 [rsHealthPoll] replSet member master.host.it:8090 is now in state PRIMARY
Wed Jul  3 15:21:24.909 [rsBackgroundSync] replSet syncing to: master.host.it:8090

Giacomo Benvenuti

unread,
Jul 3, 2013, 12:14:30 PM7/3/13
to mongo...@googlegroups.com
Mi rispondo da solo.

Bastava leggere Build Indexes on Replica Sets [0].

E' comunque una procedura abbastanza macchinosa.
Avete consigli per snellire la cosa ?

Marco Berri

unread,
Jul 4, 2013, 4:57:27 AM7/4/13
to mongo...@googlegroups.com
da quello che ho capito serve solo per evitare di loccare la replica. 

Se non hai problemi ad avere il server fermo per un piccolo periodo puoi lanciarlo sul primario. 

Oppure puoi schedulare la cosa che avvenga di notte con uno script in sh sulla shell di mongo.

spero di aver capito il tuo problema.

ciao.

Giacomo Benvenuti

unread,
Jul 10, 2013, 10:23:59 AM7/10/13
to mongo...@googlegroups.com
Si, e' comunque un'operazione scomoda e che potrebbe comportare errori
su client eventualmente connessi, anche in sola lettura, sugli slave.

Se trovo una soluzione migliore la scrivo. Grazie.

Giacomo Benvenuti



2013/7/4 Marco Berri <marco...@gmail.com>:
> --
> Hai ricevuto questo messaggio perché sei iscritto al gruppo "mongodb-it" di
> Google Gruppi.
> Per annullare l'iscrizione a questo gruppo e non ricevere più i suoi
> messaggi, invia un'email a mongodb-it+...@googlegroups.com.
> Per postare messaggi in questo gruppo, invia un'email a
> mongo...@googlegroups.com.
> Visita questo gruppo all'indirizzo
> http://groups.google.com/group/mongodb-it.
> Per ulteriori opzioni, visita https://groups.google.com/groups/opt_out.
>
>
Reply all
Reply to author
Forward
0 new messages