Problems upgrading config database to new format v4

890 views
Skip to first unread message

Human Nayebzadeh

unread,
Mar 28, 2013, 8:48:08 AM3/28/13
to mongod...@googlegroups.com
Hi everyone,

I'm stuck during the upgrade process from 2.2.3 to 2.4.

All mongodb instances in our sharded environment have been updated to 2.4 following the designated update process.
Restarting the mongos process told me, that I have to start the proces with --upgrade to upgrade the meta-data info to v4.

I tried this now for the third time, but everytime I got suck at the same place.

I also tried the "Resync after an Interruption of the Critical Section" in this doc: http://docs.mongodb.org/manual/release-notes/2.4-upgrade/ although there was never a message that told me that this step is needed now, as the doc says.

Looking at the sourcecode and comparing to some other logs of other people in the internet after the line 

"copying collection and chunk metadata to working and backup collections..."

there should immediatly follow lines wich tell me "checking epoch for collection XY ...".

But nothing happens in my case. Please see the full log from the mongos trying to upgrade:

Thu Mar 28 12:59:34.304 [mongosMain] MongoS version 2.4.1 starting: pid=9218 port=30000 64-bit host=host15.somedomain.com (--help for usage)
Thu Mar 28 12:59:34.304 [mongosMain] git version: 1560959e9ce11a693be8b4d0d160d633eee75110
Thu Mar 28 12:59:34.304 [mongosMain] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Thu Mar 28 12:59:34.304 [mongosMain] options: { config: "/etc/mongodb.conf", configdb: "cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000", logpath: "/var/log/mongodb/mongos.log", port: 30000, upgrade: tr
ue }
Thu Mar 28 12:59:34.449 [mongosMain] SyncClusterConnection connecting to [cfg1.somedomain.com:20000]
Thu Mar 28 12:59:34.450 [mongosMain] SyncClusterConnection connecting to [cfg2.somedomain.com:20000]
Thu Mar 28 12:59:34.450 [mongosMain] SyncClusterConnection connecting to [cfg3.somedomain.com:20000]
Thu Mar 28 12:59:34.754 [mongosMain] scoped connection to cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000 not being returned to the pool
Thu Mar 28 12:59:34.754 [mongosMain] SyncClusterConnection connecting to [cfg1.somedomain.com:20000]
Thu Mar 28 12:59:34.755 [mongosMain] SyncClusterConnection connecting to [cfg2.somedomain.com:20000]
Thu Mar 28 12:59:34.756 [mongosMain] SyncClusterConnection connecting to [cfg3.somedomain.com:20000]
Thu Mar 28 12:59:34.762 [LockPinger] creating distributed lock ping thread for cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000 and process host15.somedomain.com:30000:1364471974:1804289383 (sleeping for 30000
ms)
Thu Mar 28 12:59:34.762 [LockPinger] SyncClusterConnection connecting to [cfg1.somedomain.com:20000]
Thu Mar 28 12:59:34.762 [LockPinger] SyncClusterConnection connecting to [cfg2.somedomain.com:20000]
Thu Mar 28 12:59:34.763 [LockPinger] SyncClusterConnection connecting to [cfg3.somedomain.com:20000]
Thu Mar 28 12:59:34.763 [mongosMain] SyncClusterConnection connecting to [cfg1.somedomain.com:20000]
Thu Mar 28 12:59:34.764 [mongosMain] SyncClusterConnection connecting to [cfg2.somedomain.com:20000]
Thu Mar 28 12:59:34.765 [mongosMain] SyncClusterConnection connecting to [cfg3.somedomain.com:20000]
Thu Mar 28 12:59:45.807 [mongosMain] waited 11s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 12:59:56.846 [mongosMain] waited 22s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:00:07.890 [mongosMain] waited 33s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:00:18.931 [mongosMain] waited 44s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:00:29.970 [mongosMain] waited 55s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:00:41.008 [mongosMain] waited 66s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:00:52.060 [mongosMain] waited 77s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:01:03.101 [mongosMain] waited 88s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:01:14.141 [mongosMain] waited 99s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:01:25.207 [mongosMain] waited 110s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:01:36.246 [mongosMain] waited 121s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:01:47.293 [mongosMain] waited 132s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:01:58.333 [mongosMain] waited 143s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:02:09.372 [mongosMain] waited 154s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:02:20.465 [mongosMain] waited 165s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:02:31.505 [mongosMain] waited 176s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:02:42.544 [mongosMain] waited 187s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:02:53.583 [mongosMain] waited 198s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:03:04.621 [mongosMain] waited 209s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:03:15.658 [mongosMain] waited 220s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:03:26.698 [mongosMain] waited 231s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:03:37.738 [mongosMain] waited 242s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:03:48.776 [mongosMain] waited 254s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:03:59.814 [mongosMain] waited 265s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:04:10.854 [mongosMain] waited 276s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:04:13.353 [LockPinger] cluster cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000 pinged successfully at Thu Mar 28 13:04:12 2013 by distributed lock pinger 'cfg1.somedomain.com:20000,mo
Thu Mar 28 13:04:21.893 [mongosMain] waited 287s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:04:32.931 [mongosMain] waited 298s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:04:43.969 [mongosMain] waited 309s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:04:55.009 [mongosMain] waited 320s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:05:06.049 [mongosMain] waited 331s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:05:17.087 [mongosMain] waited 342s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:05:28.125 [mongosMain] waited 353s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:05:39.164 [mongosMain] waited 364s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:05:50.206 [mongosMain] waited 375s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:06:01.244 [mongosMain] waited 386s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:06:12.282 [mongosMain] waited 397s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:06:23.321 [mongosMain] waited 408s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:06:34.360 [mongosMain] waited 419s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:06:45.401 [mongosMain] waited 430s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:06:56.440 [mongosMain] waited 441s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:07:07.480 [mongosMain] waited 452s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:07:18.520 [mongosMain] waited 463s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:07:29.560 [mongosMain] waited 474s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:07:40.600 [mongosMain] waited 485s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:07:51.639 [mongosMain] waited 496s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:08:02.677 [mongosMain] waited 507s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:08:13.716 [mongosMain] waited 518s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:08:24.755 [mongosMain] waited 529s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:08:35.794 [mongosMain] waited 541s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:08:46.834 [mongosMain] waited 552s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:08:57.872 [mongosMain] waited 563s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:09:08.912 [mongosMain] waited 574s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:09:19.956 [mongosMain] waited 585s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:09:21.921 [LockPinger] cluster cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000 pinged successfully at Thu Mar 28 13:09:21 2013 by distributed lock pinger 'cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000/host15.somedomain.com:30000:1364471974:1804289383', sleeping for 30000ms
Thu Mar 28 13:09:30.996 [mongosMain] waited 596s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:09:42.037 [mongosMain] waited 607s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:09:53.076 [mongosMain] waited 618s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:10:04.117 [mongosMain] waited 629s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:10:15.157 [mongosMain] waited 640s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:10:26.197 [mongosMain] waited 651s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:10:37.237 [mongosMain] waited 662s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:10:48.275 [mongosMain] waited 673s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:10:59.315 [mongosMain] waited 684s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:11:10.355 [mongosMain] waited 695s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:11:21.395 [mongosMain] waited 706s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:11:32.434 [mongosMain] waited 717s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:11:43.473 [mongosMain] waited 728s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:11:54.514 [mongosMain] waited 739s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:12:05.554 [mongosMain] waited 750s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:12:16.592 [mongosMain] waited 761s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:12:27.631 [mongosMain] waited 772s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:12:38.671 [mongosMain] waited 783s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:12:49.711 [mongosMain] waited 794s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:13:00.753 [mongosMain] waited 805s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:13:11.791 [mongosMain] waited 817s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:13:22.831 [mongosMain] waited 828s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:13:33.869 [mongosMain] waited 839s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:13:44.910 [mongosMain] waited 850s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:13:55.949 [mongosMain] waited 861s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:14:06.991 [mongosMain] waited 872s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:14:18.030 [mongosMain] waited 883s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:14:29.070 [mongosMain] waited 894s for distributed lock configUpgrade for upgrading config database to new format v4
Thu Mar 28 13:14:35.092 [mongosMain] forcing lock 'configUpgrade/host15.somedomain.com:30000:1364468909:1804289383' because elapsed time 900327 > takeover time 900000
Thu Mar 28 13:14:35.529 [LockPinger] cluster cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000 pinged successfully at Thu Mar 28 13:14:34 2013 by distributed lock pinger 'cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000/host15.somedomain.com:30000:1364471974:1804289383', sleeping for 30000ms
Thu Mar 28 13:14:35.529 [mongosMain] lock 'configUpgrade/host15.somedomain.com:30000:1364468909:1804289383' successfully forced
Thu Mar 28 13:14:37.092 [mongosMain] distributed lock 'configUpgrade/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 5154342bf6e04e7181edb6da
Thu Mar 28 13:14:37.095 [mongosMain] starting upgrade of config server from v3 to v4
Thu Mar 28 13:14:37.095 [mongosMain] starting next upgrade step from v3 to v4
Thu Mar 28 13:14:37.096 [mongosMain] about to log new metadata event: { _id: "host15.somedomain.com-2013-03-28T12:14:37-5154342df6e04e7181edb6db", server: "host15.somedomain.com", clientAddr: "N/A", time: new Date(1364472877096), what: "starting upgrade of config database", ns: "config.version", details: { from: 3, to: 4 } }
Thu Mar 28 13:14:39.265 [mongosMain] forcing upgrade locks of previous failed upgrade with id 51542836a70d009c157f195b
Thu Mar 28 13:14:39.742 [mongosMain] stale mongos detected 345437 minutes ago, network location is host10.somedomain.com:30000, not checking version
Thu Mar 28 13:14:39.742 [mongosMain] stale mongos detected 335023 minutes ago, network location is host12.somedomain.com:30000, not checking version
Thu Mar 28 13:14:39.742 [mongosMain] stale mongos detected 277985 minutes ago, network location is host16.somedomain.com:30000, not checking version
Thu Mar 28 13:14:39.742 [mongosMain] stale mongos detected 159343 minutes ago, network location is host9:30000, not checking version
Thu Mar 28 13:14:39.742 [mongosMain] stale mongos detected 257 minutes ago, network location is host15.somedomain.com:30000, not checking version
Thu Mar 28 13:14:39.742 [mongosMain] stale mongos detected 258 minutes ago, network location is host9.somedomain.com:30000, not checking version
Thu Mar 28 13:14:39.742 [mongosMain] stale mongos detected 278 minutes ago, network location is host2.somedomain.com:30000, not checking version
Thu Mar 28 13:14:39.743 [mongosMain] checking that version of host host6.somedomain.com:27017 is compatible with 2.2
Thu Mar 28 13:14:39.744 [mongosMain] checking that version of host host7.somedomain.com:27017 is compatible with 2.2
Thu Mar 28 13:14:39.746 [mongosMain] checking that version of host host11.somedomain.com:27017 is compatible with 2.2
Thu Mar 28 13:14:39.747 [mongosMain] checking that version of host host12.somedomain.com:27017 is compatible with 2.2
Thu Mar 28 13:14:39.749 [mongosMain] checking that version of host host10.somedomain.com:25000 is compatible with 2.2
Thu Mar 28 13:14:39.751 [mongosMain] checking that version of host host8.somedomain.com:25000 is compatible with 2.2
Thu Mar 28 13:14:39.752 [mongosMain] checking that version of host host19.somedomain.com:25000 is compatible with 2.2
Thu Mar 28 13:14:39.754 [mongosMain] checking that version of host host20.somedomain.com:25000 is compatible with 2.2
Thu Mar 28 13:14:39.756 [mongosMain] checking that version of host host21.somedomain.com:25000 is compatible with 2.2
Thu Mar 28 13:14:39.757 [mongosMain] warning: could not run server status command on host21.somedomain.com:25000 :: caused by :: 11002 socket exception [6] server [host21.somedomain.com:25000] mongos connectionpool error: couldn't connect to server host21.somedomain.com:25000, you must manually verify this mongo server is offline (for at least 5 minutes) or of a version >= 2.2
Thu Mar 28 13:14:39.757 [mongosMain] checking that version of host host22.somedomain.com:25000 is compatible with 2.2
Thu Mar 28 13:14:40.183 [mongosMain] acquiring locks for 14 sharded collections...
Thu Mar 28 13:14:40.959 [mongosMain] distributed lock 'backend.cash_transactions/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543430f6e04e7181edb6e8
Thu Mar 28 13:14:41.768 [mongosMain] distributed lock 'backend.clickables/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543430f6e04e7181edb6e9
Thu Mar 28 13:14:42.751 [mongosMain] distributed lock 'backend.f2011_friendly_matches/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543431f6e04e7181edb6ea
Thu Mar 28 13:14:43.560 [mongosMain] distributed lock 'backend.f2011_gko_tournaments/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543432f6e04e7181edb6eb
Thu Mar 28 13:14:44.386 [mongosMain] distributed lock 'backend.f2011_players/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543433f6e04e7181edb6ec
Thu Mar 28 13:14:45.210 [mongosMain] distributed lock 'backend.f2011_playerteams/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543434f6e04e7181edb6ed
Thu Mar 28 13:14:46.021 [mongosMain] distributed lock 'backend.f2011_rbcl_matches/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543435f6e04e7181edb6ee
Thu Mar 28 13:14:47.000 [mongosMain] distributed lock 'backend.f2011_rbcl_playerteams/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543436f6e04e7181edb6ef
Thu Mar 28 13:14:47.881 [mongosMain] distributed lock 'backend.f2011_rbcl_standings_history/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543437f6e04e7181edb6f0
Thu Mar 28 13:14:48.658 [mongosMain] distributed lock 'backend.notifications/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543437f6e04e7181edb6f1
Thu Mar 28 13:14:48.658 [mongosMain] acquired 10 locks out of 14 for config upgrade
Thu Mar 28 13:14:49.326 [mongosMain] distributed lock 'backend.purchases/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543438f6e04e7181edb6f2
Thu Mar 28 13:14:50.237 [mongosMain] distributed lock 'backend.rewards/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 51543439f6e04e7181edb6f3
Thu Mar 28 13:14:51.146 [mongosMain] distributed lock 'backend.user_game_summaries/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 5154343af6e04e7181edb6f4
Thu Mar 28 13:14:51.993 [mongosMain] distributed lock 'backend.users/host15.somedomain.com:30000:1364471974:1804289383' acquired, ts : 5154343bf6e04e7181edb6f5
Thu Mar 28 13:14:51.993 [mongosMain] copying collection and chunk metadata to working and backup collections...
Thu Mar 28 13:19:52.689 [LockPinger] cluster cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000 pinged successfully at Thu Mar 28 13:19:50 2013 by distributed lock pinger 'cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000/host15.somedomain.com:30000:1364471974:1804289383', sleeping for 30000ms
Thu Mar 28 13:25:10.039 [LockPinger] cluster cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000 pinged successfully at Thu Mar 28 13:25:08 2013 by distributed lock pinger 'cfg1.somedomain.com:20000,cfg2.somedomain.com:20000,cfg3.somedomain.com:20000/host15.somedomain.com:30000:1364471974:1804289383', sleeping for 30000ms


I downgraded the mongos now to 2.2.3 again to go online again. But I would like to do the upgrade as soon as possible.

Thanks in advance for your help!

Taylor Fort

unread,
Mar 29, 2013, 5:41:28 PM3/29/13
to mongod...@googlegroups.com
go into the config db and check out the locks on record.  db.locks.find().pretty();

"_id" : "configUpgrade"

Check out what state the configUpgrade is in.  iirc, you want it to be in state 2 when upgrading (it's actually doing something).  If it's in state 0, then nothing's running.  Check what other locks are in current state 2 to see if you're getting blocked. 

Human Nayebzadeh

unread,
Mar 30, 2013, 8:29:41 AM3/30/13
to mongod...@googlegroups.com
Hi Stefanita,

thank you for your response. I tried to stop the balancer directly on one of the config servers as you have suggested. But actually I think this might be the core problem, because I get the following strange output when trying to stop the balancer (also from a mongos console, not only directly from a configserver):

MongoDB shell version: 2.4.1
connecting to: host16.somedomain.com:20000/test
configsvr> use config
switched to db config
configsvr> sh.stopBalancer()
Waiting for active hosts...
Waiting for active host host10.somedomain.com:30000 to recognize new settings... (ping : Tue Jul 31 2012 16:57:07 GMT+0200 (CEST))
Waited for active ping to change for host host10.somedomain.com:30000, a migration may be in progress or the host may be down.
Waiting for active host host12.somedomain.com:30000 to recognize new settings... (ping : Tue Aug 07 2012 22:30:59 GMT+0200 (CEST))
Waited for active ping to change for host host12.somedomain.com:30000, a migration may be in progress or the host may be down.
Waiting for active host host16.somedomain.com:30000 to recognize new settings... (ping : Sun Sep 16 2012 13:08:55 GMT+0200 (CEST))
Waited for active ping to change for host host16.somedomain.com:30000, a migration may be in progress or the host may be down.
Waiting for active host host15.somedomain.com:30000 to recognize new settings... (ping : Thu Mar 28 2013 08:56:41 GMT+0100 (CET))
Waited for active ping to change for host host15.somedomain.com:30000, a migration may be in progress or the host may be down.
Waiting for active host host9.somedomain.com:30000 to recognize new settings... (ping : Fri Mar 29 2013 21:13:47 GMT+0100 (CET))
Waited for active ping to change for host host9.somedomain.com:30000, a migration may be in progress or the host may be down.
Waiting for the balancer lock...

The strange things about this are the following facts:

Host10 has no mongod instance running on port 30000! I use port 30000 only for my mongos instances and they only run on host9 and host15! But host10 and host12 are normal mongod instances of two different shards. They are both the primary nodes in their shard. Host16 is the first configserver and all configservers run on port 20000, not 30000! So why is it trying to wait for a ping of these not existing hosts?

Best regards,
Human

Am Freitag, 29. März 2013 20:52:52 UTC+1 schrieb Stefanita Rares Dumitrescu:
hi, 

try connecting to the config port 27019 or if you setup a custom one - connect to that, then do a : sh.stopBalancer()

once that is done, then start the mongos with --upgrade and it shall work. at least it worked for me alright.

Human Nayebzadeh

unread,
Mar 30, 2013, 8:45:44 AM3/30/13
to mongod...@googlegroups.com
Hi Taylor,

I had a look at the locks collection and found a couple of entries with state = 2 and some more entries with state = 0. But all these state=2 entries were entries that have to do with the config upgrade:

"why" : "ensuring epochs for config upgrade (51560b3b0f8b516e9020f66d)"

I manually set their states to 0 again and tried once again to upgrade the config data. But it doesn't work neither. It hangs again at :

Sat Mar 30 13:40:29.026 [mongosMain] copying collection and chunk metadata to working and backup collections...

I really have no clue on how to solve this problem. :-(

Best regards,
Human
Reply all
Reply to author
Forward
0 new messages