Wrong shard name distributed to config servers

207 views
Skip to first unread message

Sebastian Dahlgren

unread,
Apr 18, 2012, 2:38:08 AM4/18/12
to mongod...@googlegroups.com
Hi!

We have a few MongoDB clusters in production. And something strange happened yesterday when I was upgrading our mongod binaries from 2.0.1 to 2.0.4.

Our setup
- 3 config servers
- 1 SECONDARY
- 1 PRIMARY
- 1 ARBITER
- This replica set represents one shard (in this cluster we do only have one shard)
- A bunch of mongos

Upgrade process
We are not just stopping mongod and replacing the binary, due to how our servers are managed in the AWS cloud. The process is that we replace the whole server and just switch AWS Elastic IP for the node.

1. rs.add("<non-elastic-dns-name>:27017")
2. Wait until the new node is synced
3. rs.remove("non-elastic-dns-name>:27017")
4. Stop the mongod process on the node, elastic-dns-name, which we are replacing (otherwise it takes a quite long time until the other nodes realize the change of server in the background)
5. Give the change the new node's DNS name from non-elastic-dns-name to elastic-dns-name
6. Restart all mongos routers, as we have seen that the non-elastic-dns-name sticks in some routers' view of the world

The new node is now part of the cluster with a new mongod binary

The problem
The problem this time was that when I, after completing step 6 above, started a new mongos router it tried to connect to the mongod at non-elastic-dns-name. The reason for that was that the config servers was distributing a new shard configuration it did only contain non-elastic-dns-name.

So, what was previously in the config servers:

> use config
> db.shards.find()
{ "_id" : "primarySet1", "host" : "primarySet1/elastic-dns-name1:27017,elastic-dns-name2:27017" }

Had become:

> use config
> db.shards.find()
{ "_id" : "primarySet1", "host" : "primarySet1/non-elastic-dns-name:27017" }

We have not seen this behavior before. The solution for us was to run a db.shards.update() and add the old configuration.

Is this expected behavior, have we something stupid in how we do this upgrade or are we running in to a bug?

Best regards
Sebastian Dahlgren

Kyle Banker

unread,
Apr 18, 2012, 5:02:01 PM4/18/12
to mongod...@googlegroups.com
This is the expected behavior. If you make a change to the configuration of a replica set that's part of a shard cluster, then that change is going to be communicated to the config servers. That's clearly what happened in this case.

In your case, you don't want entries in the 'shards' collection to change at all. We probably need a way to distinguish between using using elastic DNS and the like and users not using such devices. I've created a ticket here:

Sebastian Dahlgren

unread,
Apr 19, 2012, 2:49:02 AM4/19/12
to mongod...@googlegroups.com
Thanks for the feedback, Kyle!

Just one question, though. It's fine with me (even though your proposed solution seems to be even better) that the temporary node is propagated to the 'shards' collection while I sync it, however I don't really get why it is still in the 'shards' collection when I have done a rs.remove() from the replica set. In fact it is the only node represented in the 'shards' collection.

Eliot Horowitz

unread,
Apr 19, 2012, 2:50:46 AM4/19/12
to mongod...@googlegroups.com
Can you send a rs.conf()
The shards collections should update to the new config once its done.

> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-user/-/7RI737bxeu4J.
>
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mongodb-user?hl=en.

Sebastian Dahlgren

unread,
Apr 19, 2012, 2:53:12 AM4/19/12
to mongod...@googlegroups.com
rs.conf()
{
"_id" : "archiveSet1",
"version" : 15,
"members" : [
{
"_id" : 0,
"host" : "elastic-ip1:27017"
},
{
"_id" : 1,
"host" : "elastic-ip2:27017"
},
{
"_id" : 2,
"host" : "elastic-ip3:27317",
"arbiterOnly" : true
}
]

Marc

unread,
Apr 26, 2012, 11:12:50 AM4/26/12
to mongodb-user
Hello. I just wanted to follow up on this thread.

We believe that you have experienced this issue: https://jira.mongodb.org/browse/SERVER-5058
- "mongos shouldn't update config seed unless primary indicates it
should", which is related to https://jira.mongodb.org/browse/SERVER-4731
- Removed replica set nodes should not appear as members of the
replica set.

The issue arises when isMaster() is called on a removed member of a
replica set. The removed member responds to isMaster(), and the
sharding configuration gets updated erroneously with the removed
member.

The work around is to manually update the shards collection in the
config db. (which it appears that you have already done.) and
restart each member in the cluster.

We apologize for the inconvenience. This issue is slated to be fixed
in version 2.1.2.

Sebastian Dahlgren

unread,
May 2, 2012, 2:16:05 AM5/2/12
to mongod...@googlegroups.com
Thank you for the update Marc!

(read your answer in the phone some days ago, but forgot to say thank you for getting back on this)
Reply all
Reply to author
Forward
0 new messages