Mongo 3.0.12: aggregate() fails on mongos with 'secondary' as read preference in sharded environment

272 views
Skip to first unread message

Taneli Riitaoja

unread,
Dec 11, 2017, 3:31:49 PM12/11/17
to mongodb-user
Hello,

we encountered a problem on our MongoDB sharded cluster when trying to do aggregate() operations on a mongos using 'secondary' as the read preference. Previously we hadn't used aggregate() with 'secondary', so this was a bit of a surprise.

Does anybody happen to know if this is related to some known issue?

Apologies for the long message, but I wanted to list all the information I found relevant (I'll provide more if it helps).

Mongo version used: 3.0.12
Mongos server OS: Ubuntu 16.04.1 LTS
mongo-shard1-m9 server OS: Ubuntu 14.04.2 LTS

The mongos server is separate from the MongoDB instances

Here's how the problem manifests:

(Some field names, hostnames, and field values have been obfuscated)

mongos> db.getMongo().setReadPref('secondary')
mongos> db.example_collection.aggregate([{$limit: 100}, {$group: {_id: {exampleField: '$exampleField', uid: '$uid', exampleField2: '$exampleField2'}, count: {$sum: 1}}}, {$sort: {count: -1}}])
assert: command failed: {
"errmsg" : "exception: connection pool: connect failed mongo-shard1-m9:27017 : couldn't initialize connection to host mongo-shard1-m9, address is invalid",
    "code" : 13328,
"ok" : 0
} : aggregate failed
Error: command failed: {
   "errmsg" : "exception: connection pool: connect failed mongo-shard1-m9:27017 : couldn't initialize connection to host mongo-shard1-m9, address is invalid",
    "code" : 13328,
"ok" : 0
} : aggregate failed
   at Error (<anonymous>)
   at doassert (src/mongo/shell/assert.js:11:14)
   at Function.assert.commandWorked (src/mongo/shell/assert.js:254:5)
   at DBCollection.aggregate (src/mongo/shell/collection.js:1278:12)
   at (shell):1:25
2017-12-07T15:04:06.893+0000 E QUERY    Error: command failed: {
  "errmsg" : "exception: connection pool: connect failed mongo-shard1-m9:27017 : couldn't initialize connection to host mongo-shard1-m9, address is invalid",
    "code" : 13328,
"ok" : 0
} : aggregate failed
   at Error (<anonymous>)
   at doassert (src/mongo/shell/assert.js:11:14)
   at Function.assert.commandWorked (src/mongo/shell/assert.js:254:5)
   at DBCollection.aggregate (src/mongo/shell/collection.js:1278:12)
   at (shell):1:25 at src/mongo/shell/assert.js:13

I'll show some relevant information, and then show that there shouldn't be anything wrong with the connectivity.

A document in the example collection typically has the following content:

    mongos> db.example_collection.findOne()
   {
       "_id" : ObjectId("5794ffe15f82d20a2204661a"),
          "exampleField" : ObjectId("576108fce8dec363dcb08fa1"),
         "exampleField3" : 106400,
      "exampleField4" : {
            "1" : 23050,
                   "0" : 39000,
                   "3" : 24000,
                   "2" : 20350
    },
     "uid" : ObjectId("569c45ea87e53b2515aa41d3"),
          "exampleField2" : "exampleStringValue",
        "exampleField5" : 5
   }



The database has sharding enabled, and the example collection is sharded and well balanced (no problems with the instance mentioned in the error message):

mongos> db.collections.find({"_id": "exampledb.example_collection"})
{ "_id" : "exampledb.example_collection", "lastmod" : ISODate("2015-09-14T13:17:10.628Z"), "dropped" : false, "key" : { "exampleField" : 1, "uid" : 1 }, "unique" : false, "lastmodEpoch" : ObjectId("55f6c8d6fb798cbe0bd99fae") }
mongos>


mongos> db.example_collection.getShardDistribution()

Shard mongo_shard0 at mongo_shard0/mongo-shard0-m4:27017,mongo-shard0-m7:27017
data : 4.17GiB docs : 19100611 chunks : 118
estimated data per chunk : 36.23MiB
estimated docs per chunk : 161869

Shard mongo_shard1 at mongo_shard1/mongo-shard1-m7:27017,mongo-shard1-m9:27017
data : 4.46GiB docs : 20027531 chunks : 118
estimated data per chunk : 38.79MiB
estimated docs per chunk : 169724

Shard mongo_shard2 at mongo_shard2/mongo-shard2-m5:27017,mongo-shard2-m7:27017
data : 4.4GiB docs : 19751106 chunks : 118
estimated data per chunk : 38.22MiB
estimated docs per chunk : 167382

Shard mongo_shard3 at mongo_shard3/mongo-shard3-m2:27017,mongo-shard3-m3:27017
data : 4.47GiB docs : 20028826 chunks : 118
estimated data per chunk : 38.8MiB
estimated docs per chunk : 169735

Shard mongo_shard4 at mongo_shard4/mongo-shard4-m0:27017,mongo-shard4-m3:27017
data : 3.99GiB docs : 17998046 chunks : 118
estimated data per chunk : 34.64MiB
estimated docs per chunk : 152525

Shard mongo_shard5 at mongo_shard5/mongo-shard5-m2:27017,mongo-shard5-m3:27017
data : 3.95GiB docs : 17918781 chunks : 119
estimated data per chunk : 34.03MiB
estimated docs per chunk : 150577

Totals
data : 25.47GiB docs : 114824901 chunks : 709
Shard mongo_shard0 contains 16.39% data, 16.63% docs in cluster, avg obj size on shard : 234B
Shard mongo_shard1 contains 17.55% data, 17.44% docs in cluster, avg obj size on shard : 239B
Shard mongo_shard2 contains 17.29% data, 17.2% docs in cluster, avg obj size on shard : 239B
Shard mongo_shard3 contains 17.55% data, 17.44% docs in cluster, avg obj size on shard : 239B
Shard mongo_shard4 contains 15.67% data, 15.67% docs in cluster, avg obj size on shard : 238B
Shard mongo_shard5 contains 15.52% data, 15.6% docs in cluster, avg obj size on shard : 237B


The sharding for the collection seems to be properly in effect:

 mongos> sh.status()
--- Sharding Status ---
  sharding version: {
     "_id" : 1,
     "minCompatibleVersion" : 5,
    "currentVersion" : 6,
  "clusterId" : ObjectId("55e596811502e512710a14be")
}
  shards:
 {  "_id" : "mongo_shard0",  "host" : "mongo_shard0/mongo-shard0-m4:27017,mongo-shard0-m7:27017" }
      {  "_id" : "mongo_shard1",  "host" : "mongo_shard1/mongo-shard1-m7:27017,mongo-shard1-m9:27017" }
      {  "_id" : "mongo_shard2",  "host" : "mongo_shard2/mongo-shard2-m5:27017,mongo-shard2-m7:27017" }
      {  "_id" : "mongo_shard3",  "host" : "mongo_shard3/mongo-shard3-m2:27017,mongo-shard3-m3:27017" }
      {  "_id" : "mongo_shard4",  "host" : "mongo_shard4/mongo-shard4-m0:27017,mongo-shard4-m3:27017" }
      {  "_id" : "mongo_shard5",  "host" : "mongo_shard5/mongo-shard5-m2:27017,mongo-shard5-m3:27017" }
  balancer:
  Currently enabled:  yes
        Currently running:  no
         Balancer active window is set between 8:00 and 11:00 server local time
 Failed balancer rounds in last 5 attempts:  0
  Migration Results for the last 24 hours:
               No recent migrations
  databases:
      {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "exampledb",  "partitioned" : true,  "primary" : "mongo_shard0" }
           exampledb.example_collection
                   shard key: { "exampleField" : 1, "uid" : 1 }
                   chunks:
                                mongo_shard0    118
                            mongo_shard1    118
                            mongo_shard2    118
                            mongo_shard3    118
                            mongo_shard4    118
                            mongo_shard5    119
                    too many chunks to print, use verbose if you want to force print
...
...
...



The error message is complaining about a connectivity problem (specifically it hints that the address isn't resolvable?) but
there doesn't seem to be any connectivity related problems (and the problem happens solely with aggregate()):

(commands executed from the mongos server shell)

$ telnet mongo-shard1-m9 27017
Trying 10.0.12.92...
Connected to mongo-shard1-m9.
Escape character is '^]'.
^C
^C
^C
Connection closed by foreign host.


$ grep mongo-shard1-m9 /etc/hosts
10.0.12.92 mongo-shard1-m9


We can connect from the mongos server to the mongo-shard1-m9 database directly, and changing the read pref and running the aggregate() poses no problems:

$ mongo mongo-shard1-m9:27017/exampledb
mongo_shard1:SECONDARY> db.getMongo().setReadPref('secondary')
mongo_shard1:SECONDARY> db.example_collection.aggregate([{$limit: 100}, {$group: {_id: {exampleField: '$exampleField', uid: '$uid', exampleField2: '$exampleField2'}, count: {$sum: 1}}}, {$sort: {count: -1}}])
{ "_id" : { "exampleField" : ObjectId("56e89ebde8dec33ab458df3f"), "uid" : ObjectId("5598d8c914faf629ab75c813"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("56fc05aee8dec30c99ccf1df"), "uid" : ObjectId("576ba761dba20613d69b9060"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("5710aa45e8dec34fc4522f4f"), "uid" : ObjectId("581f81c8b1915e1abc773289"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("56e8a045e8dec33ab458df49"), "uid" : ObjectId("56dcb18206f7fb78277a909d"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("5714d0d1e8dec36099f854d1"), "uid" : ObjectId("58276fc19c54b74a858f215b"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("5706540ae8dec3323c521a07"), "uid" : ObjectId("580981af2cbb6176909d7bad"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("566ffcd4e8dec33d11fd2ce2"), "uid" : ObjectId("57dc6bab7ae52d17a0e0f483"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("568f216ce8dec34ab05e688f"), "uid" : ObjectId("57a1d843f7c9aa1471e96dc4"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("56543e2fe8dec32c5d106a8a"), "uid" : ObjectId("58305aae9105ea4efc426a70"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("570ce2eee8dec3414d4b23d6"), "uid" : ObjectId("567ee2c0613df62d8c90625b"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("57600064e8dec363dcb08f69"), "uid" : ObjectId("585c0876f46f7e0fd8563a3b"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("573d4ae3e8dec304420c40e4"), "uid" : ObjectId("568aeea1bf21d036447efd95"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("57075cd3e8dec3361fa8802c"), "uid" : ObjectId("57fbafd4f7c9aa7031a6faad"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("56650925e8dec3207331b676"), "uid" : ObjectId("5723953450d065292e2ece87"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("56650925e8dec3207331b676"), "uid" : ObjectId("565949427e570510791060c6"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("57076756e8dec3361fa88044"), "uid" : ObjectId("582780e453ea98262fec2132"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("5665096ae8dec3207331b678"), "uid" : ObjectId("57de38c43288d718f38509fd"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("570ce2eee8dec3414d4b23d6"), "uid" : ObjectId("57be04167a6581430b7304a9"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("56fa9009e8dec37f8d7887ae"), "uid" : ObjectId("57e1c2be14c6923e548470aa"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
{ "_id" : { "exampleField" : ObjectId("57065a1de8dec3323c521a21"), "uid" : ObjectId("54f931faa5dd225edd98737a"), "exampleField2" : "exampleStringValue" }, "count" : 1 }
Type "it" for more



The replication group configuration is as follows:

mongo_shard1:SECONDARY> rs.conf()
{
     "_id" : "mongo_shard1",
"version" : 57,
"members" : [
          {
                      "_id" : 2,
                     "host" : "MongoTools:30002",
                   "arbiterOnly" : true,
                  "buildIndexes" : true,
                 "hidden" : false,
                      "priority" : 1,
                "tags" : {

                        },
                     "slaveDelay" : 0,
                      "votes" : 1
            },
             {
                      "_id" : 8,
                     "host" : "mongo-shard1-m7:27017",
                      "arbiterOnly" : false,
                 "buildIndexes" : true,
                 "hidden" : false,
                      "priority" : 10,
                       "tags" : {
                             "use" : "production"
                   },
                     "slaveDelay" : 0,
                      "votes" : 1
            },
             {
                      "_id" : 10,
                    "host" : "mongo-shard1-m9:27017",
                      "arbiterOnly" : false,
                 "buildIndexes" : true,
                 "hidden" : false,
                      "priority" : 1,
                "tags" : {
                             "use" : "production"
                   },
                     "slaveDelay" : 0,
                      "votes" : 1
            }
      ],
     "settings" : {
         "chainingAllowed" : true,
              "heartbeatTimeoutSecs" : 10,
           "getLastErrorModes" : {

                },
             "getLastErrorDefaults" : {
                     "w" : 1,
                       "wtimeout" : 0
         }
      }
}


And the status is as follows:

mongo_shard1:SECONDARY> rs.status()
{
   "set" : "mongo_shard1",
"date" : ISODate("2017-12-07T15:08:11.707Z"),
  "myState" : 2,
 "syncingTo" : "mongo-shard1-m7:27017",
 "members" : [
          {
                      "_id" : 2,
                     "name" : "MongoTools:30002",
                   "health" : 1,
                  "state" : 7,
                   "stateStr" : "ARBITER",
                "uptime" : 1511203,
                    "lastHeartbeat" : ISODate("2017-12-07T15:08:09.917Z"),
                 "lastHeartbeatRecv" : ISODate("2017-12-07T15:08:10.003Z"),
                     "pingMs" : 0,
                  "configVersion" : 57
           },
             {
                      "_id" : 8,
                     "name" : "mongo-shard1-m7:27017",
                      "health" : 1,
                  "state" : 1,
                   "stateStr" : "PRIMARY",
                "uptime" : 1511203,
                    "optime" : Timestamp(1512659290, 4),
                   "optimeDate" : ISODate("2017-12-07T15:08:10Z"),
                "lastHeartbeat" : ISODate("2017-12-07T15:08:10.081Z"),
                 "lastHeartbeatRecv" : ISODate("2017-12-07T15:08:11.046Z"),
                     "pingMs" : 0,
                  "electionTime" : Timestamp(1509665979, 1),
                     "electionDate" : ISODate("2017-11-02T23:39:39Z"),
                      "configVersion" : 57
           },
             {
                      "_id" : 10,
                    "name" : "mongo-shard1-m9:27017",
                      "health" : 1,
                  "state" : 2,
                   "stateStr" : "SECONDARY",
                      "uptime" : 1511203,
                    "optime" : Timestamp(1512659291, 10),
                  "optimeDate" : ISODate("2017-12-07T15:08:11Z"),
                "syncingTo" : "mongo-shard1-m7:27017",
                 "configVersion" : 57,
                  "self" : true
          }
      ],
     "ok" : 1
}


Any thoughts what might be causing this or how to to approach this problem?

Googling the error message obviously gives results about something being wrong with connectivity, but only aggregate() seems to be broken.

We haven't encountered other difficulties with the secondary read preference in our sharded environment or with the mongo instance that aggregate() complains about.


Best regards,

Taneli Riitaoja
Yousician




**Confidentiality**
The information contained in this e-mail is confidential, may be privileged and is intended solely for the use of the named addressee. Access to this e-mail by any other person is not authorised. If you are not the intended recipient, you should not disclose, copy, distribute, take any action or rely on it and you should please notify the sender by reply. Any opinions expressed are not necessarily those of the company.

We may monitor all incoming and outgoing emails in line with current legislation. We have taken steps to ensure that this email and attachments are free from any virus, but it remains your responsibility to ensure that viruses do not adversely affect you.

王周全

unread,
Dec 16, 2019, 4:26:12 PM12/16/19
to mongodb-user
  • Have you solved this problem? How did you solve it?I had the same problem


Kevin Adistambha

unread,
Dec 17, 2019, 12:25:47 AM12/17/19
to mongodb-user

Hi,

Please note that you’re replying to a thread that was created from way back in 2017 that was using MongoDB 3.0.12. Current MongoDB version is 4.2.2, and it may not have the same issues or limitations as old MongoDB versions anymore.

Instead, please open a new thread describing:

  • What you need to do
  • What you have tried so far
  • What error message (if any) that you saw
  • Description of your deployment (topology, e.g. sharded, and details of it, MongoDB version, OS version, driver version, etc.)

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages