Hi,
We seem to be having a couple of issues with a sharded set-up we currently have in production, the error seems to stop us being able to write to the database until we restart the mongos process. Here is our set-up:
10.0.1.189 Mongo1
10.0.1.186 Mongo2
10.0.1.175 Mongo3
10.0.1.18 Mongo4
10.0.1.19 Mongo5
10.0.1.20 Mongo6
10.0.1.163 MongoConfig1
10.0.1.45 MongoConfig2
10.0.1.46 MongoConfig3
Mongos on every application server
Here is the error message we are getting from the C# driver:
MongoDB.Driver.MongoSafeModeException: Safemode detected an error 'writeback'. (Response was { "shards" : ["10.0.1.186:27018", "10.0.1.189:27018", "10.0.1.18:27018", "10.0.1.19:27018", "10.0.1.20:27018", "10.0.1.75:27018"], "shardRawGLE" : { "10.0.1.186:27018" : { "n" : 0, "connectionId" : 20430, "err" : null, "ok" : 1.0 }, "10.0.1.189:27018" : { "n" : 0, "connectionId" : 19981, "err" : null, "ok" : 1.0 }, "10.0.1.18:27018" : { "n" : 0, "connectionId" : 14907, "err" : null, "ok" : 1.0 }, "10.0.1.19:27018" : { "n" : 0, "connectionId" : 14861, "err" : null, "ok" : 1.0 }, "10.0.1.20:27018" : { "n" : 0, "connectionId" : 14827, "err" : null, "ok" : 1.0 }, "10.0.1.75:27018" : { "err" : "writeback", "code" : 9517, "n" : 0, "connectionId" : 18417, "ok" : 1.0 } }, "n" : 0, "err" : "writeback", "errs" : ["writeback"], "errObjects" : [{ "err" : "writeback", "code" : 9517, "n" : 0, "connectionId" : 18417, "ok" : 1.0 }], "connectionId" : 14933, "ok" : 1.0, "writebackGLE" : { "shards" : ["10.0.1.186:27018", "10.0.1.189:27018", "10.0.1.18:27018", "10.0.1.19:27018", "10.0.1.20:27018", "10.0.1.75:27018"], "shardRawGLE" : { "10.0.1.186:27018" : { "n" : 0, "connectionId" : 20430, "err" : null, "ok" : 1.0 }, "10.0.1.189:27018" : { "n" : 0, "connectionId" : 19981, "err" : null, "ok" : 1.0 }, "10.0.1.18:27018" : { "n" : 0, "connectionId" : 14907, "err" : null, "ok" : 1.0 }, "10.0.1.19:27018" : { "n" : 0, "connectionId" : 14861, "err" : null, "ok" : 1.0 }, "10.0.1.20:27018" : { "n" : 0, "connectionId" : 14827, "err" : null, "ok" : 1.0 }, "10.0.1.75:27018" : { "err" : "writeback", "code" : 9517, "n" : 0, "connectionId" : 18417, "ok" : 1.0 } }, "n" : 0, "err" : "writeback", "errs" : ["writeback"], "errObjects" : [{ "err" : "writeback", "code" : 9517, "n" : 0, "connectionId" : 18417, "ok" : 1.0 }] }, "initialGLEHost" : "10.0.1.20:27018" }).
at MongoDB.Driver.Internal.MongoConnection.SendMessage(MongoRequestMessage message, SafeMode safeMode, String databaseName)
at MongoDB.Driver.MongoCollection.InsertBatch(Type nominalType, IEnumerable documents, MongoInsertOptions options)
at MongoDB.Driver.MongoCollection.Insert(Type nominalType, Object document, MongoInsertOptions options)
at Core.TCS.Mongo.MongoWriter.WriteTcsDataset(Dataset dataset)
Here is the error we get from the mongos log
Thu Nov 15 10:04:46 [WriteBackListener-10.0.1.20:27018] GLE is { shards: [ "10.0.1.186:27018", "10.0.1.189:27018", "10.0.1.18:27018", "10.0.1.19:27018", "10.0.1.20:27018", "10.0.1.75:27018" ], shardRawGLE: { 10.0.1.186:27018: { n: 0, connectionId: 19198, err: null, ok: 1.0 }, 10.0.1.189:27018: { n: 0, connectionId: 18756, err: null, ok: 1.0 }, 10.0.1.18:27018: { n: 0, connectionId: 13685, err: null, ok: 1.0 }, 10.0.1.19:27018: { n: 0, connectionId: 13649, err: null, ok: 1.0 }, 10.0.1.20:27018: { err: "writeback", code: 9517, n: 0, connectionId: 13601, ok: 1.0 }, 10.0.1.75:27018: { n: 0, connectionId: 17188, err: null, ok: 1.0 } }, n: 0, err: "writeback", errs: [ "writeback" ], errObjects: [ { err: "writeback", code: 9517, n: 0, connectionId: 13601, ok: 1.0 } ] }
Thu Nov 15 10:04:46 [WriteBackListener-10.0.1.20:27018] GLE is { singleShard: "10.0.1.20:27018", err: "writeback", code: 9517, n: 0, connectionId: 13601, ok: 1.0 }
Thu Nov 15 10:04:46 [WriteBackListener-10.0.1.20:27018] new version change detected, 2 writebacks processed previously
Thu Nov 15 10:04:46 [WriteBackListener-10.0.1.20:27018] writeback failed because of stale config, retrying attempts: 1
Thu Nov 15 10:04:46 [WriteBackListener-10.0.1.20:27018] ChunkManager: time to load chunks for TCS.Bev5: 1ms sequenceNumber: 42 version: 12|3||50a484f2791194e675b2ab16 based on: 12|1||50a484f2791194e675b2ab16
Thu Nov 15 10:04:46 [WriteBackListener-10.0.1.20:27018] GLE is { singleShard: "10.0.1.20:27018", n: 0, connectionId: 13601, err: null, ok: 1.0 }
Thu Nov 15 10:04:49 [mongosMain] connection accepted from 127.0.0.1:50420 #9641 (16 connections now open)
Thu Nov 15 10:04:49 [conn9641] end connection 127.0.0.1:50420 (15 connections now open)
We are currently using 2.2.1. The Shards and configs are on linux and the mongos’s are running on windows (not sure if this will make a difference?)
Also, not sure if this will make a difference, but the way we use the database is to insert data into it all day. Then in the early morning we have a scripted job which drops the database, adds the indexes and enables sharding again.
Thanks for your help.