Confused by Chunk Ranges - Min date > Max date

671 views
Skip to first unread message

davekirk

unread,
Sep 19, 2011, 9:44:09 PM9/19/11
to mongodb-user
I'm very puzzled/confused by what I'm seeing when I do a
db.printShardingStatus() command in the shell for my collection. I
have a compound shard key {i:1,t:1} where i is an integer (id) that
ranges from 1 to about 2700 and t is a timestamp field that starts
around 5/1/2011 in my data. All of the data so far is between 5/1/2011
and 5/2/2011. The data is pretty randomly distributed in the i values
at any given timestamp (t) and there are over a million records
inserted.

In several of the chunks, the min t is actually a date that is later
than the max t of the chunk. Am I reading this wrong? How could the
chunk actually hold any data with this definition? I must be missing
something.

I thought the chunk would hold data where the values of i are between
min.i and max.i and values of t are between min.t and max.t. Clearly
this is not true, or I'm misreading the data in the chunk definition.

Here is the output for this collection in the mongo shell:

{ "i" : { $minKey : 1 }, "t" : { $minKey : 1 } } -->> { "i" : 1, "t" :
ISODate("2011-04-30T23:46:44.810Z") } on : shard0001 { "t" : 5000,
"i" : 0 }
{ "i" : 1, "t" : ISODate("2011-04-30T23:46:44.810Z") } -->> { "i" :
493, "t" : ISODate("2011-05-01T10:46:28.610Z") } on : shard0000
{ "t" : 3000, "i" : 1 }
{ "i" : 493, "t" : ISODate("2011-05-01T10:46:28.610Z") } -->> { "i" :
1094, "t" : ISODate("2011-05-01T03:26:00.740Z") } on : shard0002
{ "t" : 5000, "i" : 1 }
{ "i" : 1094, "t" : ISODate("2011-05-01T03:26:00.740Z") } -->> { "i" :
2085, "t" : ISODate("2011-05-01T23:15:29.030Z") } on : shard0002
{ "t" : 3000, "i" : 4 }
{ "i" : 2085, "t" : ISODate("2011-05-01T23:15:29.030Z") } -->> { "i" :
2554, "t" : ISODate("2011-05-01T21:38:59.870Z") } on : shard0003
{ "t" : 4000, "i" : 2 }
{ "i" : 2554, "t" : ISODate("2011-05-01T21:38:59.870Z") } -->> { "i" :
{ $maxKey : 1 }, "t" : { $maxKey : 1 } } on : shard0003 { "t" : 4000,
"i" : 3 }

I also did a stats() on this collection and there are over a million
records in shards0000, shard0002, and shard0003. shard001 is empty,
but that's expected since the range of data is outside the bounds of
data I inserted.

I'm sure there's is something I'm missing here, but its not obvious to
me. It probably has to do with how the min/max values in the chunk
work with a compound shard key.

Can someone please explain what is going on?

Scott Hernandez

unread,
Sep 19, 2011, 10:19:37 PM9/19/11
to mongod...@googlegroups.com
Lets work with an easier compound key, like two ints between 1-10.

Here are the chunk ranges:

chunk min->max:
[$min, $min] -> [1, 8]
[1,8] -> [3,1]
[3,1] -> [5,2]
[5,2] -> [5,10]
[5,10] -> [7,3]
[7,3] -> [$max,$max]

These are just random splits based on my imaginary data which produces
those ranges of similar data sizes; each chunk is about 64MB.

Notice how the ranges are based on the two values combined (compound),
not each value individually? See how the second value goes up and down
through the chunks but the combination is always increasing from
min->max?

Does that make more sense now?

BTW. You can ignore the shard name and number after it, it is a
version for that chunk, like 'shard0000 { "t" : 3000, "i" : 1 }', this
is used for internal house-keeping.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

davekirk

unread,
Sep 20, 2011, 12:25:00 AM9/20/11
to mongodb-user
Ok, it took me a while to see this, but I understand it now. Basically
the first key is increasing across all chunks in your example, and the
2nd key is going up and down as you say. The two numbers together for
an index into a large 2 dimensional array from [$minKey,$minKey] to
[$maxKey,$maxKey]. The min is the index of the first point in the
chunk and the max is the index of the data point that falls just
outside the chunk.

Thank you.
Reply all
Reply to author
Forward
0 new messages