Newbie Sharding - couldn't connect to new shard mongos connectionpool

1,945 views
Skip to first unread message

Jake

unread,
Oct 11, 2011, 11:38:21 AM10/11/11
to mongodb-user
This is a super basic question that has been driving me up a wall. I
have been setting up a sharded and replicated environment using the
instructions at http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture.
I did this once before and it worked fine. I started a new deployment
and am now running into the following error whenever I try to add the
shard:

mongos> db.adminCommand( { addShard : "shard1/
server1:27018,server2:27018,server3:27018" })
{
"ok" : 0,
"errmsg" : "couldn't connect to new shard mongos connectionpool:
connect failed shard1/server1:27018,server2:27018,server3:27018 :
connect failed to set shard1/
server1:27018,server2:27018,server3:27018"
}

My setup:

- 3 large EC2 machines (server1, server2, server3)
- mongod running as --shardsvr in --replSet "shard1" on port 27018
on each
- config server running on port 27019 on each

ps aux | grep mongo: (same for each of the 3 servers)
root 2503 0.0 0.0 23184 1336 ? Ss 15:04 0:00
sudo /usr/bin/mongod --shardsvr --dbpath /mnt/data/mongo/db --logpath
/mnt/data/mongo/mongodb.log --logappend --journal --rest --
replSet shard1
root 2522 0.0 0.0 23184 1328 ? Ss 15:04 0:00
sudo /usr/bin/mongod --configsvr --journal --dbpath /mnt/data/mongo/
configdb --logpath
/mnt/data/mongo/mongodb.log --logappend

- 1 small EC2 machine as appserver running mongos
ps aux | grep mongo:
root 2711 0.0 0.0 2400 1200 ? Ss 15:13 0:00
sudo /usr/bin/mongos --configdb
server1:27019,server2:27019,server3:27019

I've initialized the replica set across the machines:

PRIMARY> rs.conf()
{
"_id" : "shard1",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "server1:27018",
"priority" : 2
},
{
"_id" : 1,
"host" : "server2:27018"
},
{
"_id" : 2,
"host" : "server3:27018",
"priority" : 0,
"hidden" : true
}
]
}


PRIMARY> rs.status()
{
"set" : "shard1",
"date" : ISODate("2011-10-11T15:31:39Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "server1:27018",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" : {
"t" : 1318345607000,
"i" : 1
},
"optimeDate" : ISODate("2011-10-11T15:06:47Z"),
"self" : true
},
{
"_id" : 1,
"name" : "server2:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1481,
"optime" : {
"t" : 1318345607000,
"i" : 1
},
"optimeDate" : ISODate("2011-10-11T15:06:47Z"),
"lastHeartbeat" : ISODate("2011-10-11T15:31:37Z"),
"pingMs" : 0
},
{
"_id" : 2,
"name" : "server3:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1485,
"optime" : {
"t" : 1318345607000,
"i" : 1
},
"optimeDate" : ISODate("2011-10-11T15:06:47Z"),
"lastHeartbeat" : ISODate("2011-10-11T15:31:38Z"),
"pingMs" : 0
}
],
"ok" : 1
}

And ports 27017-27019 are open on the machines. On the appserver
(where mongos is running) I can do the following:

jake@appserver:~$ mongo server1:27018
MongoDB shell version: 2.0.0
connecting to: server1:27018/test
PRIMARY> exit
bye
jake@appserver:~$ mongo server1:27019
MongoDB shell version: 2.0.0
connecting to: server1:27019/test
> exit
bye

So I know I can connect to the machines, so why the heck can't I add
this shard??

I'm sure this is a super basic mistake somewhere but any help is
hugely appreciated,
J

Jake

unread,
Oct 11, 2011, 11:42:30 AM10/11/11
to mongodb-user
Quick addition: All the IPs I'm using for server1, server2, server3
are EC2 Private IP addresses (to avoid data transfer costs). I don't
think that's the issue, but I should mention just in case.

On Oct 11, 11:38 am, Jake <jakepor...@gmail.com> wrote:
> This is a super basic question that has been driving me up a wall.  I
> have been setting up a sharded and replicated environment using the
> instructions athttp://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture.

Jake

unread,
Oct 11, 2011, 12:30:08 PM10/11/11
to mongodb-user
UPDATE: Restarted mongos and the first shard added correctly. Tried
to add a second shard and got the same error ("couldn't connect to new
shard mongos connectionpool"). Restarted mongos and then that shard
added. I'm glad I've got my shards added, but I know that's not the
correct behavior - any ideas on why I have to keep restarting my
mongos between adding new shards?

Greg Studer

unread,
Oct 11, 2011, 2:09:46 PM10/11/11
to mongodb-user
I wonder if this has to do with the actual command line :

mongos> db.adminCommand( { addShard : "shard1/
server1:27018,server2:27018,server3:27018" })
{
"ok" : 0,
"errmsg" : "couldn't connect to new shard mongos
connectionpool:
connect failed shard1/server1:27018,server2:27018,server3:27018 :
connect failed to set shard1/
server1:27018,server2:27018,server3:27018"

}

Looks like there's a newline in the replica set string in the error
when mongos tries to connect, or is that just the way this got pasted?

Jake

unread,
Nov 14, 2011, 8:23:31 PM11/14/11
to mongodb-user
Nah, that was just the way it got pasted. Having this problem again
and no amount of restarting mongos is helping. Very frustrating...

Eliot Horowitz

unread,
Nov 14, 2011, 11:01:14 PM11/14/11
to mongod...@googlegroups.com
Can you try connecting from the shell on the same box as the mongos?
Does that work?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Jake

unread,
Nov 15, 2011, 10:29:59 AM11/15/11
to mongodb-user
Hey Eliot,

Thanks for looking into this. I can connect to each machine from
mongos, and each of these has been started as "shard1". Same applies
for the config servers. I must be missing something brutally obvious
but I can't for the life of me figure out what it is...?

---------

jake@appserver:~$ mongo
MongoDB shell version: 2.0.0
connecting to: test
mongos> conn = new Mongo("10.90.147.146:27018")
connection to 10.90.147.146:27018
PRIMARY>

mongos> conn = new Mongo("10.128.53.76:27018")
connection to 10.128.53.76:27018
SECONDARY>

mongos> conn = new Mongo("10.86.137.159:27018")
connection to 10.86.137.159:27018
SECONDARY>

mongos>
db.adminCommand({ addshard:"shard1/10.90.147.146,10.128.53.76,10.86.137.159"})
{
"ok" : 0,
"errmsg" : "couldn't connect to new shard mongos connectionpool:
connect failed shard1/10.90.147.146,10.128.53.76,10.86.137.159 :
connect failed to set shard1/10.90.147.146,10.128.53.76,10.86.137.159"

Jake

unread,
Nov 15, 2011, 10:53:20 AM11/15/11
to mongodb-user
Additional info:

Running mongos with -vvvvv shows that it can't actually connect for
some reason when using addshard, despite the fact that I can connect
from within mongos by hand:


[conn1] single query: admin.$cmd { addshard:
"shard1/10.90.147.146,10.128.53.76,10.86.137.159" } ntoreturn: -1
options : 0
Tue Nov 15 15:43:45 BackgroundJob starting: ConnectBG
Tue Nov 15 15:43:45 [conn1] error connecting to seed 10.90.147.146:
couldn't connect to server 10.90.147.146
Tue Nov 15 15:43:45 BackgroundJob starting: ConnectBG
Tue Nov 15 15:43:45 [conn1] error connecting to seed 10.128.53.76:
couldn't connect to server 10.128.53.76
Tue Nov 15 15:43:45 BackgroundJob starting: ConnectBG
Tue Nov 15 15:43:45 [conn1] error connecting to seed 10.86.137.159:
couldn't connect to server 10.86.137.159
Tue Nov 15 15:43:45 [conn1] _check : shard1/
Tue Nov 15 15:43:45 BackgroundJob starting: ReplicaSetMonitorWatcher
Tue Nov 15 15:43:45 [ReplicaSetMonitorWatcher] starting
Tue Nov 15 15:43:47 [conn1] User Assertion: 10009:ReplicaSetMonitor no
master found for set: shard1
Tue Nov 15 15:43:47 [conn1] User Assertion: 13328:mongos
connectionpool: connect failed
shard1/10.90.147.146,10.128.53.76,10.86.137.159 : connect failed to
set shard1/10.90.147.146,10.128.53.76,10.86.137.159
Tue Nov 15 15:43:47 [conn1] addshard request { addshard:
"shard1/10.90.147.146,10.128.53.76,10.86.137.159" } failed: couldn't
connect to new shard mongos connectionpool: connect failed
shard1/10.90.147.146,10.128.53.76,10.86.137.159 : connect failed to
set shard1/10.90.147.146,10.128.53.76,10.86.137.159

Specifying ports seems to have no effect:

db.adminCommand({ addshard:"shard1/10.90.147.146:27018,
10.128.53.76:27018, 10.86.137.159:27018})
{
"ok" : 0,
"errmsg" : "couldn't connect to new shard mongos connectionpool:
connect failed
shard1/10.90.147.186:27018,10.124.53.76:27018,10.86.177.159:27018 :
connect failed to set
shard1/10.90.147.146:27018,10.128.53.76:27018,10.86.137.159:27018"
}

So what's different about the way mongos connects when I call addshard
vs. when I do conn = new Mongo()?
> ...
>
> read more »

Jake

unread,
Nov 15, 2011, 12:07:09 PM11/15/11
to mongodb-user
Here's a bizarre fix that "worked" (?) this time:

- Logged in to each machine in the replica set
- Called db.isMaster(), rs.config(), rs.status() on the primary
machine.
- Called db.isMaster() on the two secondary machines
- Restarted mongos
- Added the shard with
db.adminCommand({ addshard:"shard1/10.90.147.146:27018,
10.128.53.76:27018, 10.86.137.159:27018"})
- Got an error:
{
"ok" : 0,
"errmsg" : "in seed list shard1/10.90.147.146:27018,
10.128.53.76:27018, 10.86.137.159:27018, host 10.86.137.159:27018 does
not belong to replica set shard1"
}
- A different error! Hurrah! That server is a hidden node, so I
removed it from the list and tried again and it worked.
- Had to restart mongos again to get the same thing to work for the
second shard.

Again, while this worked, I'd love to know what I'm doing wrong,
because this just feels so hacky. I'm hoping this info helps someone
else narrow down the problem.

Thanks!
J
> ...
>
> read more »

Greg Studer

unread,
Nov 15, 2011, 10:42:51 PM11/15/11
to mongodb-user
looked a bit at the source,
Tue Nov 15 15:43:45 [conn1] error connecting to seed 10.90.147.146:
couldn't connect to server 10.90.147.146
seems to be b/c the default port is 27017 - you'll need to specify
ports, which you did

db.adminCommand({ addshard:"shard1/10.90.147.146:27018,
10.128.53.76:27018, 10.86.137.159:27018"}) - there may be strange
behavior with spaces here, would be good to see the -vvvvv output from
the version w/o spaces and w/ ports to see exactly what's being
rejected by mongos.
> > > > >> > > >    ...
>
> read more »

Surya

unread,
Dec 29, 2011, 3:53:32 AM12/29/11
to mongod...@googlegroups.com
Hi Jake, did you get any solution for this? I am getting the same problem.

Thanks,
S

Eliot Horowitz

unread,
Dec 31, 2011, 6:01:06 PM12/31/11
to mongod...@googlegroups.com
Surya - can you start a new thread with your logs and exactly what's going on?

> --


> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-user/-/No7RbU45iDkJ.

James Elwood

unread,
Jan 3, 2012, 4:18:44 PM1/3/12
to mongodb-user
Eliot, we can confirm the same behavior. We needed to restart mongos
in order to configure a shard. Here's a link to the log => http://cl.ly/D1VB.

Let us know if there is anything else we can do to help troubleshoot.

Greg Studer

unread,
Jan 3, 2012, 5:54:11 PM1/3/12
to mongodb-user
At this point I think it'd be easier to track down and kill this issue
if we use a server ticket - opened one here: SERVER-4606 - can you
fill in the environment, and the exact steps you took when you saw the
problem?

Spencer T Brody

unread,
Jan 19, 2012, 1:21:54 PM1/19/12
to mongod...@googlegroups.com
Is this still happening for you guys?  There has been no activity on SERVER-4606.  Can you please fill in the ticket with the exact steps to reproduce and attach the verbose mongos logs?

Peter

unread,
Jan 27, 2012, 4:06:12 PM1/27/12
to mongodb-user
just happened to us as well.

mongos refused addShard with a replica set consisting of 2 nodes and
an arbiter.

what we did to resolve this issue:
1. remove arbiter
2. restart mongos
3. add shard
4. add arbiter

could be related to:
https://jira.mongodb.org/browse/SERVER-3610


On 19 Jan., 19:21, Spencer T Brody <spen...@10gen.com> wrote:
> Is this still happening for you guys?  There has been no activity on
> SERVER-4606 <https://jira.mongodb.org/browse/SERVER-4606>.  Can you please

Greg Studer

unread,
Jan 29, 2012, 4:09:15 PM1/29/12
to mongodb-user
Added your info to the ticket SERVER-4606, is this 2.0.2 as well?

Peter

unread,
Jan 30, 2012, 10:04:28 AM1/30/12
to mongodb-user
yes, 2.0.2

John Feibusch

unread,
Apr 23, 2012, 9:30:05 PM4/23/12
to mongod...@googlegroups.com
I just got this, and I think it was caused by https://jira.mongodb.org/browse/SERVER-4447 .

If you enter an incorrect addShard command (for example, you forget the port number) then a subsequent correct command will not work until you restart mongos. It seems to keep some information from the incorrect command.

The JIRA says fixed in version 2.1.1, but the latest stable release is 2.0.4. So I guess for now, just restart mongos if you happen to enter an incorrect addShard command.

-john
Reply all
Reply to author
Forward
0 new messages