Operation timed out while trying to shard a collection

598 views
Skip to first unread message

Flash Gorman

unread,
Apr 8, 2016, 11:00:53 AM4/8/16
to mongodb-user
Have a vanilla install of mongodb.  Added 100 million documents to my instance.  This takes up 54 GB.  Created a compound index on "publisher" and "reported" (a date).  This appears to take up around 2 GB on disk.

So far, so good.

Now I enable sharding and try to shard this collection (though I haven't yet added the other mongod shard):

mongos> sh.shardCollection("test.events", { "publisher": 1, "reported": 1})
{ "code" : 50, "ok" : 0, "errmsg" : "Operation timed out" }

but no dice.  Operation times out (after a minute or so).  On the mongod side, I can see evidence that it's trying to do something:
 
2016-04-08T10:45:11.461-0400 I NETWORK  [initandlisten] connection accepted from 9.80.193.174:64048 #2 (2 connections now open)
2016-04-08T10:46:23.478-0400 I COMMAND  [conn2] command admin.$cmd command: checkShardingIndex { checkShardingIndex: "test.events", keyPattern: { publisher: 1.0, reported: 1.0 } } keyUpdates:0 writeConflicts:0 numYields:784390 reslen:74 locks:{ Global: { acquireCount: { r: 1568782 } }, Database: { acquireCount: { r: 784391 } }, Collection: { acquireCount: { r: 784391 } } } protocol:op_command 72013ms
2016-04-08T10:46:23.506-0400 I SHARDING [conn1] request split points lookup for chunk test.events { : MinKey, : MinKey } -->> { : MaxKey, : MaxKey }
2016-04-08T10:46:33.459-0400 I NETWORK  [initandlisten] connection accepted from 9.80.193.174:64063 #3 (3 connections now open)
2016-04-08T10:47:12.843-0400 W SHARDING [conn1] Finding the split vector for test.events over { publisher: 1.0, reported: 1.0 } keyCount: 27639 numSplits: 3632 lookedAt: 13520 took 49336ms
2016-04-08T10:47:12.846-0400 I COMMAND  [conn1] command admin.$cmd command: splitVector { splitVector: "test.events", keyPattern: { publisher: 1.0, reported: 1.0 }, min: { publisher: MinKey, reported: MinKey }, max: { publisher: MaxKey, reported: MaxKey }, maxChunkSizeBytes: 67108864, maxSplitPoints: 0, maxChunkObjects: 0 } keyUpdates:0 writeConflicts:0 numYields:784390 reslen:236463 locks:{ Global: { acquireCount: { r: 1568782 } }, Database: { acquireCount: { r: 784391 } }, Collection: { acquireCount: { r: 784391 } } } protocol:op_command 49339ms
2016-04-08T10:47:12.846-0400 I NETWORK  [conn1] end connection 9.80.193.174:64033 (2 connections now open)

Just to make sure I'm not doing anything wrong, I created a second collection (events2) that only contains 1,000 documents and it shards just fine:

mongos> sh.shardCollection("test.events2", { "publisher": 1, "reported": 1})
{ "collectionsharded" : "test.events2", "ok" : 1 }

I'm assuming the problem here is just the fact I've got 100 million documents and it's just having a hard time but that's what MongoDB is supposed to be good at, right?

And I was also thinking that the timeout would just affect my mongos client but that the actual operation would still continue but sh.status() doesn't show my "events" collection as being a sharded collection (it just shows my small "events2" collection):

mongos> sh.status()
--- Sharding Status ---
  sharding version: {
    "_id" : 1,
    "minCompatibleVersion" : 5,
    "currentVersion" : 6,
    "clusterId" : ObjectId("5707b69d7dd2b2c5d33f8b9e")
}
  shards:
    {  "_id" : "shard0000",  "host" : "9.80.193.174:27017" }
  active mongoses:
    "3.2.4" : 1
  balancer:
    Currently enabled:  yes
    Currently running:  no
    Failed balancer rounds in last 5 attempts:  0
    Migration Results for the last 24 hours:
        No recent migrations
  databases:
    {  "_id" : "test",  "primary" : "shard0000",  "partitioned" : true }
        test.events2
            shard key: { "publisher" : 1, "reported" : 1 }
            unique: false
            balancing: true
            chunks:
                shard0000    1
            { "publisher" : { "$minKey" : 1 }, "reported" : { "$minKey" : 1 } } -->> { "publisher" : { "$maxKey" : 1 }, "reported" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)

Lastly, I am confused why this operation should take any time at all in the first place.  I already have the proper index in place - and there is only one mongod instance; I have not yet added the other mongod instance(s) so it seems to me there isn't any *real* work to do, yet - there's no other shard to move the data to so don't know why it should be timing out in the first place.

Hoping for help because, at this point, I am STUCK!
 

Flash Gorman

unread,
Apr 11, 2016, 9:48:22 AM4/11/16
to mongodb-user
Anyone have any ideas?

Kevin Adistambha

unread,
Apr 13, 2016, 7:01:20 PM4/13/16
to mongodb-user

Hi,

{ “code” : 50, “ok” : 0, “errmsg” : “Operation timed out” }

Error code 50 is a generic error message, saying that the operation exceeded a time limit. In this case:

2016-04-08T10:46:23.478-0400 I COMMAND [conn2] command admin.$cmd command: checkShardingIndex { checkShardingIndex: “test.events”, keyPattern: { publisher: 1.0, reported: 1.0 } } keyUpdates:0 writeConflicts:0 numYields:784390 reslen:74 locks:{ Global: { acquireCount: { r: 1568782 } }, Database: { acquireCount: { r: 784391 } }, Collection: { acquireCount: { r: 784391 } } } protocol:op_command 72013ms

this checkShardingIndex operation took 72 seconds, and

2016-04-08T10:47:12.846-0400 I COMMAND [conn1] command admin.$cmd command: splitVector { splitVector: “test.events”, keyPattern: { publisher: 1.0, reported: 1.0 }, min: { publisher: MinKey, reported: MinKey }, max: { publisher: MaxKey, reported: MaxKey }, maxChunkSizeBytes: 67108864, maxSplitPoints: 0, maxChunkObjects: 0 } keyUpdates:0 writeConflicts:0 numYields:784390 reslen:236463 locks:{ Global: { acquireCount: { r: 1568782 } }, Database: { acquireCount: { r: 784391 } }, Collection: { acquireCount: { r: 784391 } } } protocol:op_command 49339ms

this splitVector operation took 49 seconds. In combination, both operations took 121 seconds to finish, so the mongos returned the message that the sh.shardCollection() command took too long to finish in the form of a timeout error.

The checkShardingIndex command is an internal command that performs a collection scan to ensure that every document in the collection contains the shard key. For example, if the shard key is {a:1,b:1}, it will check every document for the existence of the field a and b.

The splitVector command is also an internal command. It uses the average object size and number of object in the collection to split the collection into chunks, based on the shard key and the maximum chunk size (which defaults to 64 MB).

The issue that leads to your timeout error is not only the number of documents, but also the size of the collection itself. Instead of importing the whole collection and sharding it afterward, a better solution is to pre-split the collection before importing. This will allow MongoDB to import documents directly into the chunks as efficiently as possible.

To pre-split a collection:

  1. Deploy the sharded cluster.
  2. Create an empty collection using db.createCollection() and also create the necessary indexes using db.collection.createIndex(), including the index that corresponds to the shard key.
  3. Shard the collection using sh.shardCollection()
  4. Pre-split the destination (empty) collection as described in Create Chunks in a Sharded Cluster. You should see the balancer distributes the empty chunks across the shards. Note: this pre-split step is automatically done if you are using a hashed shard key.
  5. Import your data via mongos.

Since the empty chunks are already distributed and balanced, the imported data will go straight into the proper shard.

Please note that you should only pre-split an empty collection.

a compound index on “publisher” and “reported” (a date)

Choosing the correct shard key is extremely important for a sharded cluster deployment. If the shard key you are using involves a date element, the key will be monotonically increasing. This may result in a “hot shard”, i.e. there will be a shard that is more active compared to others. This could limit your insert rate, and the cluster will constantly need to split chunks and rebalance, since all inserts could go into a single chunk. Please see Sharding Pitfalls, Shard Keys, and Considerations for Selecting Shard Keys for more information regarding choosing a shard key.

Best regards,
Kevin

Flash Gorman

unread,
Apr 14, 2016, 5:59:21 AM4/14/16
to mongodb-user
Thank you very much.
Reply all
Reply to author
Forward
0 new messages