Shards Tags not working how I expected

110 views
Skip to first unread message

Felipe Albrecht

unread,
Jun 27, 2014, 11:19:26 AM6/27/14
to mongod...@googlegroups.com
Hello,

I am using tags for the shards to improve the data insertion time, but I believe that it is not working how I expected. Because the data is being inserted in the primary and is migrated after to the shards (and not complying with the shards tags ranges).

Am I doing some wrong or missing some point?

Thanks,
Felipe Albrecht


//---- Showing ----//

Currently I have 4 servers:
infao6940:27027
infao3702:27027
deep31:27027
deep32:27027

executing db.shards.find() on the config database:
{
  "_id": "shard0000",
  "host": "infao6940:27027",
  "tags": [
    "tag_0",
    "shard0000"
  ]
}
{
  "_id": "shard0001",
  "host": "infao3702:27027",
  "tags": [
    "tag_1",
    "shard0001"
  ]
}
{
  "_id": "shard0002",
  "host": "deep31:27027",
  "tags": [
    "shard0002"
  ]
}
{
  "_id": "shard0003",
  "host": "deep32:27027",
  "tags": [
    "shard0003"
  ]
}

--

For instance, I have the collection "epidb_2.regions.hg19.e99.chrY":
Where I defined 4 tags ranges (one for each shard server)

infao6940(mongos-2.6.1)[mongos] config> db.tags.find({ns: "epidb_2.regions.hg19.e99.chrY"} )
{
  "_id": {
    "ns": "epidb_2.regions.hg19.e99.chrY",
    "min": {
      "S": 0
    }
  },
  "ns": "epidb_2.regions.hg19.e99.chrY",
  "min": {
    "S": 0
  },
  "max": {
    "S": 14843391
  },
  "tag": "shard0000"
}
{
  "_id": {
    "ns": "epidb_2.regions.hg19.e99.chrY",
    "min": {
      "S": 14843392
    }
  },
  "ns": "epidb_2.regions.hg19.e99.chrY",
  "min": {
    "S": 14843392
  },
  "max": {
    "S": 29686783
  },
  "tag": "shard0001"
}
{
  "_id": {
    "ns": "epidb_2.regions.hg19.e99.chrY",
    "min": {
      "S": 29686784
    }
  },
  "ns": "epidb_2.regions.hg19.e99.chrY",
  "min": {
    "S": 29686784
  },
  "max": {
    "S": 44530175
  },
  "tag": "shard0002"
}
{
  "_id": {
    "ns": "epidb_2.regions.hg19.e99.chrY",
    "min": {
      "S": 44530176
    }
  },
  "ns": "epidb_2.regions.hg19.e99.chrY",
  "min": {
    "S": 44530176
  },
  "max": {
    "S": 59373567
  },
  "tag": "shard0003"
}


But the sh.status():

epidb_2.regions.hg19.e99.chrY
      shard key: { "S": 1 }
      chunks:
        shard0000: 3
        { "S": { "$minKey" : 1 } } -> { "S": 0 } on: shard0000
        { "S": 0 } -> { "S": 3452193 } on: shard0000
        { "S": 3452193 } -> { "S": { "$maxKey" : 1 } } on: shard0000
        tag: shard0000  {
  "S": 0
} -> {
  "S": 14843391
}
        tag: shard0001  {
  "S": 14843392
} -> {
  "S": 29686783
}
        tag: shard0002  {
  "S": 29686784
} -> {
  "S": 44530175
}
        tag: shard0003  {
  "S": 44530176
} -> {
  "S": 59373567
}

Asya Kamsky

unread,
Jun 28, 2014, 2:23:46 AM6/28/14
to mongodb-user
And what's in the logs?   Can you also please provide the normal sh.status() formatted output - it's easier to read than the raw output.

Is the balancer on?  Is it running?  Is it stuck?  Are these chunks failing to split on attempts (see logs).  Are they failing to migrate?  Are your tag ranges correct?   Do you have enough granularity of the shard key value to allow proper splits?

What exactly are you trying to achieve by using tagging here - it's not at all clear what problem you are trying to solve.

Asya



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/CALQwOA2%3D4oB6MmtD788%2B6w4m-QSb4%3Dt9z3%2BJKOTVc8HLmkGsjw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Felipe Albrecht

unread,
Jul 1, 2014, 4:33:49 AM7/1/14
to mongod...@googlegroups.com
Hello,

before going further: 
Main objective is to eliminate the insertion in the primary shard and insert the directly in the shard where the data belongs.
Is it possible or mongo always will insert into the primary and after does the migration?

Thank you,
Felipe Albrecht


Asya Kamsky

unread,
Jul 1, 2014, 10:53:07 AM7/1/14
to mongodb-user

No, absolutely not.  Insertion goes directly into shard that owns that chunk.

Felipe Albrecht

unread,
Jul 2, 2014, 10:49:19 AM7/2/14
to mongod...@googlegroups.com
How can I "force" mongodb to do it? Using shard tags and setting the ranges?


Asya Kamsky

unread,
Jul 4, 2014, 4:09:05 AM7/4/14
to mongod...@googlegroups.com
Force it to do what?  It already does what it's supposed to.

Insertion goes into the shard that owns the chunk. Period.  It does not go to the primary shard unless that's the shard that owns the chunk you're inserting into.

Asya

Felipe Albrecht

unread,
Jul 4, 2014, 5:41:45 AM7/4/14
to mongod...@googlegroups.com
Hello,


>> Insertion goes into the shard that owns the chunk. Period.  It does not go to the primary shard unless that's the shard that owns the chunk you're inserting into.

But if I am inserting documents into an empty collection? How can I make the shard be created directly at the designed shard, and not in the primary?

Thank you,
Felipe Albrecht 



William Zola

unread,
Jul 4, 2014, 12:46:09 PM7/4/14
to mongod...@googlegroups.com
Hi Felipe!

If you're inserting into an empty collection, you need to pre-split the chunks.  See here for details: http://docs.mongodb.org/manual/tutorial/create-chunks-in-sharded-cluster/

 -William Z
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

Felipe Albrecht

unread,
Jul 8, 2014, 2:55:04 AM7/8/14
to mongod...@googlegroups.com
Hello, 

I am using the "split" command with the "moveChunk" to move the splitted chunk to the shard where it should stay.
Is this the right approach?

Also, there is some overhead for big chuncks? I am creating one big chunk (1Gb) for each shards (total of 4). 
The only issue that I can see is if we had more than 1Gb of data in one shard and this data be stored in the primary.
There is another problems?

Thanks,
Felipe Albrecht

 


William Zola

unread,
Jul 8, 2014, 10:29:29 AM7/8/14
to mongod...@googlegroups.com
Hi Felipe!

My responses are in-line


On Tuesday, July 8, 2014 1:55:04 AM UTC-5, Felipe Albrecht wrote:
Hello, 

I am using the "split" command with the "moveChunk" to move the splitted chunk to the shard where it should stay.
Is this the right approach?

Yes, this is the right approach.
 

Also, there is some overhead for big chuncks? I am creating one big chunk (1Gb) for each shards (total of 4). 

I'm not understanding this question.  If you are pre-splitting the chunks, you're splitting them and moving them before there is any data loaded.  In this case, the chunks you create and move have zero size.

When you load data into a sharded cluster, the chunks will automatically split when they exceed 'chunksize'.  (Ref: http://docs.mongodb.org/manual/core/sharding-chunk-splitting/)  If you use the default chunksize and have 1 GB of data on each shard, then you'll end up with 16 chunks on each shard. 

Can you please clarify your question?

 -William 

Felipe Albrecht

unread,
Jul 8, 2014, 10:35:38 AM7/8/14
to mongod...@googlegroups.com
Hello, sorry, I forgot to mention that I am also using the value 1024 for the database paramenter chunckSize (in the collection config.settings).


From your answer, I understood that the chunks are created empty, without any pre-allocation. Is it right?

Thanks,
Felipe Albrecht

 


William Zola

unread,
Jul 8, 2014, 12:29:15 PM7/8/14
to mongod...@googlegroups.com
Hi Felipe!

OK, I see what's going on.  Your mental model of what a "chunk" is in MongoDB sharding is not correct.  Since your mental model isn't correct, you have a whole bunch of questions that don't make any sense to folks who have the correct mental model of how sharding works in MongoDB

It takes more time and space to explain how sharding works than can fit into a google groups post.  Please review the following videos:


Once you've done that, please come back with any further questions you have.

 -William 

Felipe Albrecht

unread,
Jul 9, 2014, 8:49:57 AM7/9/14
to mongod...@googlegroups.com
Hello, 
I really had a different "mental model" for chunk, where they were reserved space, but they are a logical view.

I watched both videos and both explained very well some questions.

I still have one question:
My application has 8 threads inserting data into mongodb. For that, each time, a thread splits a new collection, and move the chunks to the shards using the command "moveChunk". Sometimes it receives the "errmsg: "migration already in progress"". It does happens because chuck migrations cant happen in parallel? What is the best approach: do synchronize the "moveChunk" command execution? Or to try again the "moveChunk" when the application receives this error message?


Ps: the moveChunk documentation page (http://docs.mongodb.org/manual/reference/command/moveChunk/) says: 

moveChunk returns the following error message if another metadata operation is in progress on thechunks collection:

errmsg: "The collection's metadata lock is already taken."
Are you sure about this errmsg ? I was searching for this errmsg in the mongodb source base (https://github.com/mongodb/mongo/search?q=%22The+collection%22&ref=cmdform), but I did not find.


Thank you,
Felipe Albrecht


Felipe Albrecht

unread,
Jul 9, 2014, 12:43:33 PM7/9/14
to mongod...@googlegroups.com
Hello,

I solved my problem turning of the automatic chunk migration ( sh.stopBalancer() ) . Now, I do the migrations in the application.
Thank you for the help,

Felipe Albrecht

Felipe Albrecht

unread,
Jul 11, 2014, 4:03:03 PM7/11/14
to mongod...@googlegroups.com

Hello again,

I am seeing the mongos log and I found a lot of this warning:

[the ChunkManager: time to load chunks for ... happens for *all* shared collections, that is... more than 500]

2014-07-11T21:33:23.139+0200 [conn16] ChunkManager: time to load chunks for work_db.regions.hg19.e99.chrX: 0ms sequenceNumber: 40968386 version: 2|3||53bfe2f47288ce5c348751a0 based on: (empty)

2014-07-11T21:33:23.139+0200 [conn16] ChunkManager: time to load chunks for work_db.regions.hg19.e99.chrY: 0ms sequenceNumber: 40968387 version: 2|1||53bfe3017288ce5c348751af based on: (empty)

2014-07-11T21:33:23.140+0200 [conn16] ChunkManager: time to load chunks for work_db.regions.hg19.e357.chr12: 0ms sequenceNumber: 40968388 version: 2|1||53c03be47288ce5c348799a9 based on: 2|1||53c03be47288ce5c348799a9

2014-07-11T21:33:23.140+0200 [conn16] warning: chunk manager reload forced for collection 'work_db.regions.hg19.e357.chr12', config version is 2|1||53c03be47288ce5c348799a9

The problem is that each sharded collection, I have a list of this message and I see that the insertions halt for a few seconds.


The command that I am executing is: 

{ split: "work_db.regions.hg19. e357.chr12", middle: { S: 40597605 } }


Does the split command force the chunk manager reload its data?

How can I improve it?


Thank you,

Felipe Albrecht



Reply all
Reply to author
Forward
0 new messages