Error: write with bad shard config and no server id!

266 views
Skip to first unread message

Flaszter

unread,
Sep 19, 2011, 6:58:54 AM9/19/11
to mongodb-user
Hi!

-------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------
ENVIRONMENT(MongoDB 2.0)

We have a mongodb sharding environment with 3 replica set where one
replica set has:
1 primary
2 secondary
2 Arbiter

Our environment uses 6 servers (Windows server 2008) with the
following config:

1st server:
1st replica set primary
3rd replica set secondary
2nd replica set arbiter
+ 1 config server

2nd server:
2nd replica set primary
1st replica set secondary
3rd replica set arbiter
+ 1 config server

3rd server:
3rd replica set primary
2nd replica set secondary
1st replica set arbiter
+ 1 config server

4th server:
1st replica set secondary
3rd replica set secondary
2nd replica set arbiter

5th server:
2nd replica set secondary
1st replica set arbiter
3rd replica set arbiter

6th server (application server host a web site)
1 mongos instance (routing server)

-------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------

PROBLEM

We make write load on the servers and after little time we get a lot
of error from this from the C# driver:

Safemode detected an error 'write with bad shard config and no server
id!'
{ "err" : "write with bad shard config and no server id!", "code" :
10422, "n" : 0, "lastOp" : NumberLong("5652597361524867121"),
"connectionId" : 1779, "ok" : 1.0 }

We checked also the routing server's config and we found the following
messages:

StorageTest.lastKnownTransactionId could not initialize cursor across
all shards because : ns: StorageTest.lastKnownTransactionId
ClusteredCursor::query

You need to know the StorageTest is our database and the
lastKnowTransactionId is a collection which has a shard key : _id.

Have you got any idea why we get this error?
We can only solved this error via routing server restart.

Thanks,
Flaszter


Greg Studer

unread,
Sep 20, 2011, 2:19:44 PM9/20/11
to mongod...@googlegroups.com
If this is repeatable, is it possible to run the load test with the latest 1.8 nightly mongos, high verbosity ( log level 2? ) on the mongos and post the mongos logs (just the single mongos downgraded, everything else the same)?  I'm guessing now the issue has to do with the shard version getting reset, but hard to say what the cause is.  Are migrations occurring during this test, or replica set failovers? 

> StorageTest.
lastKnownTransactionId could not initialize cursor across
all shards because : ns: StorageTest.lastKnownTransactionId
ClusteredCursor::query

This error is just telling you that the query can't proceed b/c of the connection issue above. 

Running:

mongo <mongos>
> use admin
> db.runCommand({ flushRouterConfig : 1 })

may be a workaround for this state that doesn't require a mongos reboot. 

Flaszter

unread,
Sep 21, 2011, 11:32:59 AM9/21/11
to mongodb-user
Hi Greg!

I downloaded the latest 1.8.3 nightly build (mongodb-win32-
x86_64-1.8.3)and I created a new mongos instance with this.
All mongod instance is uses 2.0 (mongodb-win32-x86_64-2.0.0) mongodb
release.

Right now i don't get the mentioned exception.
How did you know?
What's wrong with the 2.0... Any Idea?

Thanks

Greg Studer

unread,
Sep 28, 2011, 7:38:32 PM9/28/11
to mongodb-user
Sorry for the delay, somehow lost this thread earlier and was
traveling.

I don't know if this is a 2.0 issue in particular, but we added more
messaging in 1.8.4, hence the nightly. If you grep your logs for :
"resetting shard version" there may be more info there. Think a
symptom of this issue we may have fixed recently.

Flaszter

unread,
Oct 4, 2011, 1:25:24 PM10/4/11
to mongodb-user
Hi,

I have tested the execution of load testing on two environment.
1.8.3:
There was no error. Everything is fine.

2.0 nightly build:
Got the same exception.
I have investigated on all logs (config servers, routing server,
mongod servers), and found another interesting info:"warning: bad
serverID set in setShardVersion and none in info: EOO".

Another problem is we have to use the 2.0 mongodb because it allow us
to use w="majority".

Thanks

Greg Studer

unread,
Oct 5, 2011, 10:51:37 AM10/5/11
to mongodb-user
ah, this is a known issue, there are fixes in place for 2.0.1. Think
the 2.0 nightly includes them now.

rusan

unread,
Oct 27, 2011, 1:38:38 AM10/27/11
to mongodb-user


On 5 окт, 18:51, Greg Studer <g...@10gen.com> wrote:
> ah, this is a known issue, there are fixes in place for 2.0.1.  Think
> the 2.0 nightly includes them now.

We use 2.0.1 and have same in logs:

Thu Oct 27 05:36:41 [initandlisten] connection accepted from
10.20.65.42:6876 #31801
Thu Oct 27 05:36:41 [conn31801] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:41 [initandlisten] connection accepted from
10.20.65.42:6881 #31802
Thu Oct 27 05:36:41 [conn31802] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:41 [initandlisten] connection accepted from
10.20.65.45:25004 #31803
Thu Oct 27 05:36:41 [conn31803] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:41 [initandlisten] connection accepted from
10.20.65.42:6886 #31804
Thu Oct 27 05:36:41 [conn31804] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:41 [initandlisten] connection accepted from
10.20.65.42:6891 #31805
Thu Oct 27 05:36:41 [conn31805] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:41 [initandlisten] connection accepted from
10.20.65.42:6896 #31806
Thu Oct 27 05:36:41 [conn31806] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:41 [initandlisten] connection accepted from
10.20.65.44:25585 #31807
Thu Oct 27 05:36:41 [conn31807] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:42 [initandlisten] connection accepted from
10.20.65.43:29607 #31808
Thu Oct 27 05:36:42 [conn31808] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:42 [initandlisten] connection accepted from
10.20.65.44:25590 #31809
Thu Oct 27 05:36:42 [conn31809] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:42 [initandlisten] connection accepted from
10.20.65.41:29083 #31810
Thu Oct 27 05:36:42 [conn31810] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:42 [initandlisten] connection accepted from
10.20.65.42:6901 #31811
Thu Oct 27 05:36:42 [conn31811] warning: bad serverID set in
setShardVersion and none in info: EOO
Thu Oct 27 05:36:42 [initandlisten] connection accepted from
10.20.65.44:25595 #31812
Thu Oct 27 05:36:42 [conn31812] warning: bad serverID set in
setShardVersion and none in info: EOO

Is this normal?

Eliot Horowitz

unread,
Oct 28, 2011, 1:30:35 AM10/28/11
to mongod...@googlegroups.com
That's a warning that can be safely ignored.
We're going to be fixing the source of it for 2.0.2

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Craig Murray

unread,
Nov 10, 2011, 1:48:17 PM11/10/11
to mongod...@googlegroups.com
We're seeing the origional exception using 2.0.1. 

Nov 10 03:51:59 www02/www02 [www02]: [Thu Nov 10 03:51:59 2011] [error] [client 85.101.115.44] MongoCursorException: 10 write with bad shard config and no server id!,


Greg Studer

unread,
Nov 10, 2011, 7:41:30 PM11/10/11
to mongodb-user
do you have the mongos/mongod logs from when this started?
opened a ticket for 2.0.1 - SERVER-4255.

Craig Murray

unread,
Nov 10, 2011, 8:01:17 PM11/10/11
to mongod...@googlegroups.com
It looks like we already have an issue filed for this (sorry, just catching up with our operations guy who's been handling this issue thus far)

    [ https://jira.mongodb.org/browse/SUPPORT-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Greg Studer

unread,
Nov 11, 2011, 10:43:09 AM11/11/11
to mongodb-user
Ah right, am tracking there too.

On Nov 10, 8:01 pm, Craig Murray <craig.mur...@rockyou.com> wrote:
> It looks like we already have an issue filed for this (sorry, just catching
> up with our operations guy who's been handling this issue thus far)
>
>     [https://jira.mongodb.org/browse/SUPPORT-171?page=com.atlassian.jira.p...]

Itai Shemesh

unread,
Dec 11, 2011, 10:39:10 AM12/11/11
to mongod...@googlegroups.com
I was going crazy for the last few days , after upgrading to
the "So-called" Stable 2.0.1 release:

- Got crazy load of connections that were never got closed.
- TONS of errors/exceptions in mongod log files:

"bad serverID set in setShardVersion and none in info: EOO"

- Many ::insert() actions that were dropped because of
"timeout()" / "socket error"


I think I found the solution for this !
All my shards (MongoD) are 2.0.1 , and I have downgraded all
my mongos instances to 1.8.4 .

Works like a charm now !

Greg Studer

unread,
Dec 12, 2011, 10:00:27 AM12/12/11
to mongodb-user
Sorry you had so much trouble with this - these are all symptoms of a
single issue that is fixed in 2.0.2. If you're stable on 1.8.4,
that's fine, but would definitely check that you're not still getting
periodic socket timeouts, as these are unrelated, but make the stale
version issues above worse.
Reply all
Reply to author
Forward
0 new messages