Creazione indice e down su Replica Sets

34 views

Skip to first unread message

Giacomo Benvenuti

unread,

Jul 3, 2013, 9:55:22 AM7/3/13

to mongo...@googlegroups.com

Salve, registro un comportamento che non riesco a capire e che mi da qualche noia.

Ho 3 server in Replica Sets:

master.host.it

slave1.host.it

slave2.host.it

Eseguo un

> db.test.Collection.ensureIndex({date:-1})

su una collection di 6M di documenti.

L'indice viene creato ma leggo da log che per qualche secondo i nodi non si sono visti ed e' scattato il meccanismo di elezione di un nuovo master.

Alla fine di tutto master.host.it che ha priorita' piu' alta ritorna master.

Questo comportamento pero' mi fa pensare a qualche problematica che non considero.

Quale e' la procedura corretta per creare un indice su collection di milioni di elementi in un Replica Set ?

Grazie mille per il supporto.

Allego sotto il log del master e di uno degli slave

Giacomo

master.host.it

Wed Jul 3 15:20:18.755 [conn475488] build index DBTest.test.Collection { date: -1.0 }

Wed Jul 3 15:20:28.852 [conn475488] Index: (1/3) External Sort Progress: 4000000/6719789 59%

Wed Jul 3 15:20:35.307 [conn475488] external sort used : 7 files in 16 secs

Wed Jul 3 15:20:45.000 [conn475488] Index: (2/3) BTree Bottom Up Progress: 4871000/6719789 72%

Wed Jul 3 15:20:48.093 [conn475488] done building bottom layer, going to commit

Wed Jul 3 15:20:49.178 [conn475488] build index done. scanned 6719789 total records. 30.423 secs

Wed Jul 3 15:20:49.179 [conn475488] insert DBTest.system.indexes ninserted:1 keyUpdates:0 locks(micros) w:30424231 30424ms

Wed Jul 3 15:20:49.179 [conn475463] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 29518, after repl: 29518, at end: 29518 }

Wed Jul 3 15:20:49.179 [conn475463] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:58 reslen:4305 29910ms

Wed Jul 3 15:20:49.181 [conn475488] command denied: { replSetGetStatus: 1.0, forShell: 1.0 }

Wed Jul 3 15:21:13.857 [rsHealthPoll] DBClientCursor::init call() failed

Wed Jul 3 15:21:13.858 [rsHealthPoll] replSet info slave2.host.it:8090 is down (or slow to respond): DBClientBase::findN: transport error: slave2.host.it:8090 ns: local.$cmd query: { authenticate: 1, nonce: "b2803984f98d75f3", user: "__system", key: "6c542fea3417906b302d5b145e581d55" }

Wed Jul 3 15:21:13.858 [rsHealthPoll] replSet member slave2.host.it:8090 is now in state DOWN

Wed Jul 3 15:21:14.908 [rsHealthPoll] DBClientCursor::init call() failed

Wed Jul 3 15:21:14.908 [rsHealthPoll] replSet info slave1.host.it:8090 is down (or slow to respond): DBClientBase::findN: transport error: slave1.host.it:8090 ns: local.$cmd query: { authenticate: 1, nonce: "be6e4e067976acc5", user: "__system", key: "cddf4ed257c141d4ea09394e4fca5ec0" }

Wed Jul 3 15:21:14.908 [rsHealthPoll] replSet member slave1.host.it:8090 is now in state DOWN

Wed Jul 3 15:21:14.908 [rsMgr] can't see a majority of the set, relinquishing primary

Wed Jul 3 15:21:14.908 [rsMgr] replSet relinquishing primary state

Wed Jul 3 15:21:14.908 [rsMgr] replSet SECONDARY

Wed Jul 3 15:21:14.908 [rsMgr] replSet closing client sockets after relinquishing primary

Wed Jul 3 15:21:15.994 [conn475464] SocketException handling request, closing client connection: 9001 socket exception [2] server [IP:46077]

Wed Jul 3 15:21:15.994 [conn475465] SocketException handling request, closing client connection: 9001 socket exception [2] server [IP:34250]

Wed Jul 3 15:21:19.376 [rsHealthPoll] replSet member slave2.host.it:8090 is up

Wed Jul 3 15:21:19.376 [rsHealthPoll] replSet member slave2.host.it:8090 is now in state SECONDARY

Wed Jul 3 15:21:19.376 [rsMgr] not electing self, slave2.host.it:8090 would veto with 'I don't think master.host.it:8090 is electable'

Wed Jul 3 15:21:22.258 [rsHealthPoll] replSet member slave1.host.it:8090 is up

Wed Jul 3 15:21:22.258 [rsHealthPoll] replSet member slave1.host.it:8090 is now in state SECONDARY

Wed Jul 3 15:21:22.557 [rsMgr] replSet info electSelf 1

Wed Jul 3 15:21:22.779 [rsMgr] replSet PRIMARY

slave1.host.it

Wed Jul 3 15:20:49.180 [repl writer worker 2] build index DBTest.test.Collection { date: -1.0 }

Wed Jul 3 15:21:00.020 [repl writer worker 2] Index: (1/3) External Sort Progress: 4000000/6719789 59%

Wed Jul 3 15:21:06.835 [repl writer worker 2] external sort used : 7 files in 17 secs

Wed Jul 3 15:21:13.865 [rsHealthPoll] DBClientCursor::init call() failed

Wed Jul 3 15:21:13.865 [rsHealthPoll] replSet info slave2.host.it:8090 is down (or slow to respond): DBClientBase::findN: transport error: slave2.host.it:8090 ns: local.$cmd query: { authenticate: 1, nonce: "3cc89674f89a6120", user: "__system", key: "9d2069f0e7e77a4737446c3dd8a87677" }

Wed Jul 3 15:21:13.865 [rsHealthPoll] replSet member slave2.host.it:8090 is now in state DOWN

Wed Jul 3 15:21:14.908 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: master.host.it:8090

Wed Jul 3 15:21:16.000 [repl writer worker 2] Index: (2/3) BTree Bottom Up Progress: 3665900/6719789 54%

Wed Jul 3 15:21:16.434 [rsHealthPoll] replset info master.host.it:8090 thinks that we are down

Wed Jul 3 15:21:16.434 [rsHealthPoll] replSet member master.host.it:8090 is now in state SECONDARY

Wed Jul 3 15:21:18.436 [rsHealthPoll] replset info master.host.it:8090 thinks that we are down

Wed Jul 3 15:21:19.376 [rsHealthPoll] replset info slave2.host.it:8090 thinks that we are down

Wed Jul 3 15:21:19.376 [rsHealthPoll] replSet member slave2.host.it:8090 is up

Wed Jul 3 15:21:19.376 [rsHealthPoll] replSet member slave2.host.it:8090 is now in state SECONDARY

Wed Jul 3 15:21:20.437 [rsHealthPoll] replset info master.host.it:8090 thinks that we are down

Wed Jul 3 15:21:21.115 [repl writer worker 2] done building bottom layer, going to commit

Wed Jul 3 15:21:21.376 [rsHealthPoll] replset info slave2.host.it:8090 thinks that we are down

Wed Jul 3 15:21:22.257 [repl writer worker 2] build index done. scanned 6719789 total records. 33.076 secs

Wed Jul 3 15:21:22.258 [conn133556] command local.$cmd command: { authenticate: 1, nonce: "d9a62e7b61be1fe7", user: "__system", key: "08f313036a8fbfbf55ab59b2d090e849" } ntoreturn:1 keyUpdates:0 locks(micros) r:178 reslen:74 5346ms

Wed Jul 3 15:21:22.258 [conn133547] command local.$cmd command: { authenticate: 1, nonce: "be6e4e067976acc5", user: "__system", key: "cddf4ed257c141d4ea09394e4fca5ec0" } ntoreturn:1 keyUpdates:0 locks(micros) r:177 reslen:74 17350ms

Wed Jul 3 15:21:22.258 [conn133546] command local.$cmd command: { authenticate: 1, nonce: "64e3be73b8f0881f", user: "__system", key: "91afcabbf435d88439f585bd44257bab" } ntoreturn:1 keyUpdates:0 locks(micros) r:206 reslen:74 17350ms

Wed Jul 3 15:21:22.258 [conn133557] command local.$cmd command: { authenticate: 1, nonce: "9a2373f52fe6724e", user: "__system", key: "a93a90bc79f292109f0f6fe4e7947b31" } ntoreturn:1 keyUpdates:0 locks(micros) r:185 reslen:74 5346ms

Wed Jul 3 15:21:22.258 [conn133517] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 24622, after repl: 24622, at end: 24622 }

Wed Jul 3 15:21:22.258 [conn133517] command admin.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:46 reslen:3992 25393ms

Wed Jul 3 15:21:22.557 [conn133557] replSet info voting yea for master.host.it:8090 (1)

Wed Jul 3 15:21:22.619 [rsSyncNotifier] Socket recv() errno:104 Connection reset by peer IP:8090

Wed Jul 3 15:21:22.619 [rsSyncNotifier] SocketException: remote: IP:8090 error: 9001 socket exception [1] server [IP:8090]

Wed Jul 3 15:21:22.619 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: master.host.it:8090

Wed Jul 3 15:21:23.619 [rsSyncNotifier] replset setting oplog notifier to master.host.it:8090

Wed Jul 3 15:21:23.619 [rsSyncNotifier] Socket flush send() errno:32 Broken pipe IP:8090

Wed Jul 3 15:21:23.619 [rsSyncNotifier] caught exception (socket exception [SEND_ERROR] for IP:8090) in destructor (~PiggyBackData)

Wed Jul 3 15:21:24.438 [rsHealthPoll] replSet member master.host.it:8090 is now in state PRIMARY

Wed Jul 3 15:21:24.909 [rsBackgroundSync] replSet syncing to: master.host.it:8090

Giacomo Benvenuti

unread,

Jul 3, 2013, 12:14:30 PM7/3/13

to mongo...@googlegroups.com

Mi rispondo da solo.

Bastava leggere Build Indexes on Replica Sets [0].

E' comunque una procedura abbastanza macchinosa.

Avete consigli per snellire la cosa ?

[0] http://docs.mongodb.org/manual/tutorial/build-indexes-on-replica-sets/#index-building-replica-sets

Marco Berri

unread,

Jul 4, 2013, 4:57:27 AM7/4/13

to mongo...@googlegroups.com

da quello che ho capito serve solo per evitare di loccare la replica.

Se non hai problemi ad avere il server fermo per un piccolo periodo puoi lanciarlo sul primario.

Oppure puoi schedulare la cosa che avvenga di notte con uno script in sh sulla shell di mongo.

spero di aver capito il tuo problema.

ciao.

Giacomo Benvenuti

unread,

Jul 10, 2013, 10:23:59 AM7/10/13

to mongo...@googlegroups.com

Si, e' comunque un'operazione scomoda e che potrebbe comportare errori
su client eventualmente connessi, anche in sola lettura, sugli slave.

Se trovo una soluzione migliore la scrivo. Grazie.

Giacomo Benvenuti

2013/7/4 Marco Berri <marco...@gmail.com>:

> --
> Hai ricevuto questo messaggio perché sei iscritto al gruppo "mongodb-it" di
> Google Gruppi.
> Per annullare l'iscrizione a questo gruppo e non ricevere più i suoi
> messaggi, invia un'email a mongodb-it+...@googlegroups.com.
> Per postare messaggi in questo gruppo, invia un'email a
> mongo...@googlegroups.com.
> Visita questo gruppo all'indirizzo
> http://groups.google.com/group/mongodb-it.
> Per ulteriori opzioni, visita https://groups.google.com/groups/opt_out.
>
>

Reply all

Reply to author

Forward

0 new messages