Re: [mongodb-user] Error using tag aware sharding

96 views
Skip to first unread message

Scott Hernandez

unread,
Oct 1, 2012, 4:20:06 PM10/1/12
to mongod...@googlegroups.com
Those aren't errors, just information logging. Everything is fine.

We have an issue to reduce that logging.
https://jira.mongodb.org/browse/SERVER-7204

It is fixed and available in the next release (2.2.1 and 2.3.0).

On Mon, Oct 1, 2012 at 3:48 PM, Santiago Ezcurra <sezc...@gmail.com> wrote:
> Hi: I've configured three replicasets (only for testing, so i have 1 node on
> each), each one of them being used as a shard:
>
> This is the shard configuration:
>
> mongos> sh.status()
> --- Sharding Status ---
> sharding version: { "_id" : 1, "version" : 3 }
> shards:
> { "_id" : "ssReplset_1", "host" :
> "ssReplset_1/10.230.42.190:27017", "tags" : [ "WE" ] }
> { "_id" : "ssReplset_2", "host" :
> "ssReplset_2/10.230.42.191:27017", "tags" : [ "WE" ] }
> { "_id" : "ssReplset_3", "host" :
> "ssReplset_3/10.230.42.192:27017", "tags" : [ "EA" ] }
> databases:
> { "_id" : "test", "partitioned" : false, "primary" :
> "ssReplset_3" }
> { "_id" : "test2", "partitioned" : true, "primary" :
> "ssReplset_1" }
> test2.users chunks:
> ssReplset_1 1
> ssReplset_3 1
> { "geoTag" : { $minKey : 1 }, "id" : { $minKey : 1 }
> } -->> { "geoTag" : "ea", "id" : 1 } on : ssReplset_1 Timestamp(2000, 1)
> { "geoTag" : "ea", "id" : 1 } -->> { "geoTag" : {
> $maxKey : 1 }, "id" : { $maxKey : 1 } } on : ssReplset_3 Timestamp(2000, 0)
> tag: EA { "geoTag" : "ea", "id" : { $minKey : 1 }
> } -->> { "geoTag" : "eb", "id" : { $maxKey : 1 } }
> tag: WE { "geoTag" : "we", "id" : { $minKey : 1 }
> } -->> { "geoTag" : "wf", "id" : { $maxKey : 1 } }
> { "_id" : "securestorage", "partitioned" : false, "primary" :
> "ssReplset_2" }
> I have two shards that have to receive documents with geoTag field = "we"
> and a third shard for documents that have geoTag field = "ea"
>
> however, when i add documents to the databsae, two things happen:
> - All documents go to ssReplset_3 (supposedly to receive only documents with
> "geoTag" field = "ea" according to tag ranges)
> - The log fo rmongos keeps saying these messages:
>
> Mon Oct 1 13:47:32 [Balancer] distributed lock
> 'balancer/ubu-1204-64b-10:27018:1349108394:1804289383' unlocked
> .
> Mon Oct 1 13:47:38 [Balancer] distributed lock
> 'balancer/ubu-1204-64b-10:27018:1349108394:1804289383' acquired
> , ts : 5069c92a53b15c9d7f181ef7
> Mon Oct 1 13:47:38 [Balancer] ssReplset_1 doesn't have right tag
> Mon Oct 1 13:47:38 [Balancer] ssReplset_2 doesn't have right tag
> Mon Oct 1 13:47:38 [Balancer] distributed lock
> 'balancer/ubu-1204-64b-10:27018:1349108394:1804289383' unlocked
> .
> Mon Oct 1 13:47:44 [Balancer] distributed lock
> 'balancer/ubu-1204-64b-10:27018:1349108394:1804289383' acquired
> , ts : 5069c93053b15c9d7f181ef8
> Mon Oct 1 13:47:44 [Balancer] ssReplset_1 doesn't have right tag
> Mon Oct 1 13:47:44 [Balancer] ssReplset_2 doesn't have right tag
> Mon Oct 1 13:47:44 [Balancer] distributed lock
> 'balancer/ubu-1204-64b-10:27018:1349108394:1804289383' unlocked
> .
> Mon Oct 1 13:47:50 [Balancer] distributed lock
> 'balancer/ubu-1204-64b-10:27018:1349108394:1804289383' acquired, ts :
> 5069c93653b15c9d7f181ef9
> Mon Oct 1 13:47:50 [Balancer] ssReplset_1 doesn't have right tag
> Mon Oct 1 13:47:50 [Balancer] ssReplset_2 doesn't have right tag
> Mon Oct 1 13:47:50 [Balancer] distributed lock
> 'balancer/ubu-1204-64b-10:27018:1349108394:1804289383' unlocked.
>
> What am i doing wrong ?
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Santiago Ezcurra

unread,
Oct 2, 2012, 9:36:47 AM10/2/12
to mongod...@googlegroups.com
Thanks Scott for your prompt answer: I've checked and it's working, though not 100% as expected. Below is the new configuration; though almost all chunks on replica set 3 are receiving geoTag: "ea" (which is fine according to configuration) there is still one range (see highlighted) that has values with tag: "we". the result is that i have a couple thousands of users with tag:
"we" in the shard # 3 (which is intended to receive only "ea"....
Also, i had to change the tag range this way (below), because specifying "ea" --> "ea" as the min/max wasn't working....is that fine ?
 
And the last question regarding this experience.....let's suppose i work out these last details, and I have a solution to store my users from US in the US shards and my users from europe in EU shards, so that when a user access next time, the application goes find the information to the "local" shards....but the true is that most of the times i wont have the geo information when the user accesses; only the ID. so I have two options: 1) I just look by user ID (but in that case i wouldn't be using shard queries and the query could go outside of local shards either). 2) I assign a tag depending on which datacenter the request entered, and look by that tag....the problem with this approach are the "roaming users": a user who was created on Europe shard, but is accessing the application from US data center....and here i should program some defensive mechanism to look that user on the other locations.....
 
Can you think of another solution ? does the geoTag always have to be part of the shard key ?
 
S.
 
--- Sharding Status ---
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
        {  "_id" : "ssReplset_1",  "host" : "ssReplset_1/10.230.42.190:27017",  "tags" : [      "WE" ] }
        {  "_id" : "ssReplset_2",  "host" : "ssReplset_2/10.230.42.191:27017",  "tags" : [      "WE" ] }
        {  "_id" : "ssReplset_3",  "host" : "ssReplset_3/10.230.42.192:27017",  "tags" : [      "EA" ] }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : false,  "primary" : "ssReplset_3" }
        {  "_id" : "securestorage",  "partitioned" : false,  "primary" : "ssReplset_2" }
        {  "_id" : "test2",  "partitioned" : true,  "primary" : "ssReplset_1" }
                test2.users chunks:
                                ssReplset_1     5
                                ssReplset_3     8
                                ssReplset_2     3
                        { "geoTag" : { $minKey : 1 }, "id" : { $minKey : 1 } } -->> { "geoTag" : "ea", "id" : 1 } on : ssReplset_1 Timestamp(7000, 1)
                        { "geoTag" : "ea", "id" : 1 } -->> { "geoTag" : "ea", "id" : 62881 } on : ssReplset_3 Timestamp(6000, 2)
                        { "geoTag" : "ea", "id" : 62881 } -->> { "geoTag" : "ea", "id" : 77599 } on : ssReplset_3 Timestamp(6000, 4)
                        { "geoTag" : "ea", "id" : 77599 } -->> { "geoTag" : "ea", "id" : 92505 } on : ssReplset_3 Timestamp(6000, 8)
                        { "geoTag" : "ea", "id" : 92505 } -->> { "geoTag" : "ea", "id" : 107370 } on : ssReplset_3 Timestamp(6000, 10)
                        { "geoTag" : "ea", "id" : 107370 } -->> { "geoTag" : "ea", "id" : 122079 } on : ssReplset_3 Timestamp(6000, 14)
                        { "geoTag" : "ea", "id" : 122079 } -->> { "geoTag" : "ea", "id" : 136703 } on : ssReplset_3 Timestamp(6000, 16)
                        { "geoTag" : "ea", "id" : 136703 } -->> { "geoTag" : "ea", "id" : 151521 } on : ssReplset_3 Timestamp(6000, 18)
                        { "geoTag" : "ea", "id" : 151521 } -->> { "geoTag" : "we", "id" : 7610 } on : ssReplset_3 Timestamp(6000, 19)
                        { "geoTag" : "we", "id" : 7610 } -->> { "geoTag" : "we", "id" : 23395 } on : ssReplset_2 Timestamp(6000, 1)
                        { "geoTag" : "we", "id" : 23395 } -->> { "geoTag" : "we", "id" : 39635 } on : ssReplset_2 Timestamp(7000, 0)
                        { "geoTag" : "we", "id" : 39635 } -->> { "geoTag" : "we", "id" : 60693 } on : ssReplset_2 Timestamp(5000, 2)
                        { "geoTag" : "we", "id" : 60693 } -->> { "geoTag" : "we", "id" : 91878 } on : ssReplset_1 Timestamp(6000, 6)
                        { "geoTag" : "we", "id" : 91878 } -->> { "geoTag" : "we", "id" : 122511 } on : ssReplset_1 Timestamp(6000, 12)
                        { "geoTag" : "we", "id" : 122511 } -->> { "geoTag" : "we", "id" : 156916 } on : ssReplset_1 Timestamp(6000, 20)
                        { "geoTag" : "we", "id" : 156916 } -->> { "geoTag" : { $maxKey : 1 }, "id" : { $maxKey : 1 } } on : ssReplset_1 Timestamp(6000, 21)
                         tag: EA  { "geoTag" : "ea" } -->> { "geoTag" : "eb" }
                         tag: WE  { "geoTag" : "we" } -->> { "geoTag" : "wf" }

Scott Hernandez

unread,
Oct 2, 2012, 9:45:39 AM10/2/12
to mongod...@googlegroups.com
On Tue, Oct 2, 2012 at 9:36 AM, Santiago Ezcurra <sezc...@gmail.com> wrote:
> Thanks Scott for your prompt answer: I've checked and it's working, though
> not 100% as expected. Below is the new configuration; though almost all
> chunks on replica set 3 are receiving geoTag: "ea" (which is fine according
> to configuration) there is still one range (see highlighted) that has values
> with tag: "we". the result is that i have a couple thousands of users with
> tag:
> "we" in the shard # 3 (which is intended to receive only "ea"....
> Also, i had to change the tag range this way (below), because specifying
> "ea" --> "ea" as the min/max wasn't working....is that fine ?
Yes, you need to split that chunk. Otherwise it is unclear where it belongs.

>
> And the last question regarding this experience.....let's suppose i work out
> these last details, and I have a solution to store my users from US in the
> US shards and my users from europe in EU shards, so that when a user access
> next time, the application goes find the information to the "local"
> shards....but the true is that most of the times i wont have the geo
> information when the user accesses; only the ID. so I have two options: 1) I
> just look by user ID (but in that case i wouldn't be using shard queries and
> the query could go outside of local shards either). 2) I assign a tag
> depending on which datacenter the request entered, and look by that
> tag....the problem with this approach are the "roaming users": a user who
> was created on Europe shard, but is accessing the application from US data
> center....and here i should program some defensive mechanism to look that
> user on the other locations.....
>
> Can you think of another solution ? does the geoTag always have to be part
> of the shard key ?

Yes, it needs to be the shard key; that is how sharding and balancing
works -- based on ranges of shard key. You would need to remove and
add the docs to change the shard key for those users/docs.

One thing people do is to do reads local to the roaming location and
ensure writes go to their roaming location for consistency of reads
after writes -- or just do important reads from the primary.

If you can describe your application a little more that would help
identify other options.

Santiago Ezcurra

unread,
Oct 2, 2012, 10:15:36 AM10/2/12
to mongod...@googlegroups.com
With regards to the chunk to split....this was done automatically by mongodb....is there any way to initially setup the collection so that further split operations ensure that no incorrect range fall in other shard ?
 
and regarding the other topic: let's say that i have user profiles that i want to store in mongodb, and i want (for velocity) to store those profiles as close as possible to the home location of the user...but when the user accesses the application, i just have the login (is the only sure data, the location is a guessing at that point).
what happens if i try to find the user only by ID (that would be a part of the shad key) ? what would mongo do ?

Jeremy Mikola

unread,
Oct 3, 2012, 10:37:55 AM10/3/12
to mongod...@googlegroups.com
On Tuesday, October 2, 2012 10:15:36 AM UTC-4, Santiago Ezcurra wrote:

and regarding the other topic: let's say that i have user profiles that i want to store in mongodb, and i want (for velocity) to store those profiles as close as possible to the home location of the user...but when the user accesses the application, i just have the login (is the only sure data, the location is a guessing at that point).
what happens if i try to find the user only by ID (that would be a part of the shad key) ? what would mongo do ?


Queries not involving the shard key use a scatter/gather method which sends the query to all shards. This is fairly efficient if one has 10 shards, but would be fairly inefficient on 1000 shards (although still ok for infrequent queries).

While find queries can utilize scatter/gather, you definitely need to include the shard key for non-multi updates, upserts and inserts. 
Reply all
Reply to author
Forward
0 new messages