Sharding index + array field

112 views
Skip to first unread message

sebarriada

unread,
Apr 7, 2011, 12:43:53 PM4/7/11
to mongodb-user
Hi all,

First some points about my config.
-------
Linux debian 5.0
db.version() --> 1.8.0
Shards, config and mongos processes running on the same server
(testing purpose).
-------

Im testing sharding with a colection that I have with +500k documents.

In that colection I have 2 indexes, one of them its an string and the
other one is an array of string.

When I run sharding with the string as index on the sharding
everything seems to be working as expected. I can add new shards and
the chunks move in the shards. As soon as the chunks are balanced in
the shards, I can connect to the mongos process, run db.col.count()
and I get those +500k docs. (while the chunks are being balanced the
count move up and down around 500k until the balance is complete).

The problems seems to be happen when Im testing this with the other
index, the one which contains the array as value. In this case, once
the chunks are balanced between the shards, I connect to the mongos
process and when I run db.col.count() the amount of documents are
reduced to 120k. (while the chunks are being balanced the count move
up and down until the balance is complete, when its stays in 120k).

The first difference between them is the amount of chunks. While in
the first case I have 12 chunks which are split between the shards,
in the second I have 214 chunks and of course the time to balance them
between the shards is bigger than in the previous one (in both cases
the size seems to be the same --> { "_id" : "chunksize", "value" :
64 })

The second difference, and the bigger one is the result that Im
getting. In the second case, the sharding is working, but reducing the
amount of documents to a 20%.

Do you know if this is a known issue? I know that Im not presenting
any log, so you cant guess whats going on (I dont want overload you
with data/logs at once). But maybe there is a known issue which I
didnt find. Let me know if you need more data/logs.

Thanks in advance,
Sebastian.

Gaetan Voyer-Perrault

unread,
Apr 7, 2011, 1:28:02 PM4/7/11
to mongod...@googlegroups.com
Hello Sebastian;

Can you provide the results of "printShardingStatus()".

Simply connect to mongoS (the router) and 
> printShardingStatus()

For ease of reading, please post up the results on a paste site like gist.github.compastie.org, pastebin.com.

- Gates


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


sebarriada

unread,
Apr 7, 2011, 4:06:01 PM4/7/11
to mongodb-user
Hello Gates,

thanks for your quick answer, here you have the Sharding status for
both cases:
http://pastie.org/1769212

Cheers,
Sebastian.

On Apr 7, 2:28 pm, Gaetan Voyer-Perrault <ga...@10gen.com> wrote:
> Hello Sebastian;
>
> Can you provide the results of "printShardingStatus()".
>
> Simply connect to mongoS (the router) and
>
> > printShardingStatus()
>
> For ease of reading, please post up the results on a paste site like
> gist.github.com, pastie.org, pastebin.com.
>
> - Gates
>

Eliot Horowitz

unread,
Apr 7, 2011, 5:13:08 PM4/7/11
to mongod...@googlegroups.com
Can you send a sample document?
You shouldn't be able to have an array be the shard key if Im understanding.

sebarriada

unread,
Apr 7, 2011, 5:20:52 PM4/7/11
to mongodb-user
Here is an example, really the data is not exactly this.

{ "_id" : ObjectId("11111111111"), "notes" : [ 90, 95, 92 ],
"description" : "", "certs" : [ "linux", "python", "cisco" ], "name" :
"Sebastian Arriada", "age" : "30" }

We have an index in certs which allow us to search quickly the people
who has specific certifications.

Then basically the problem is, if we put name as shard index it works
properly but if we put certs it doesn't. We have alternatives re
designing the schema of this collection.

Thanks a lot for your help!

Eliot Horowitz

unread,
Apr 7, 2011, 5:23:49 PM4/7/11
to mongod...@googlegroups.com
You cannot shard on certs since its an array.
When you try to insert, it should fail.
So you'll need to shard on name or something else.

sebarriada

unread,
Apr 7, 2011, 5:42:55 PM4/7/11
to mongodb-user
mmm is not failing, Im adding new docs and is working ok. In fact I
have 114k docs.

sebarriada

unread,
Apr 7, 2011, 6:33:59 PM4/7/11
to mongodb-user
One option that we are analyzing is modify the schema to something
like:

cert1: user1, user2, userX
cert2: user3, user4, userY

the only one problem that I see with this, if you are looking for a
user which has cert1 + cert2, you have to search both certs and verify
(on the app level) if the user its in both, where of course I would
prefer fix it directly on the db level (for performance).
A second issue could be with the amount of users per cert, this could
exceed the limit of 4mb per document, anyway I dont see this in the
near future, so I will focus just in the first issue.

Thanks again!
Sebastian.

Scott Hernandez

unread,
Apr 7, 2011, 10:12:25 PM4/7/11
to mongod...@googlegroups.com
Can you do a db.printShardingStatus() when you are using an array as
the shard-key?

sebarriada

unread,
Apr 8, 2011, 7:08:34 AM4/8/11
to mongodb-user
I did it already Scott
http://pastie.org/1769212

Case 1 is with a normal index and case 2 with array.

Regards,
Sebastian.

axlfu

unread,
Apr 9, 2011, 6:44:13 AM4/9/11
to mongodb-user
db.printShardingStatus({verbose:true})

On Apr 8, 11:08 am, sebarriada <sebastianarri...@gmail.com> wrote:
> I did it already Scotthttp://pastie.org/1769212

sebarriada

unread,
Apr 12, 2011, 6:38:16 AM4/12/11
to mongodb-user
Hi axlfu,

Unfortunately Im not allowed to expose all the records, but with
verbose I have the same information that I show in the link plus the
chunks.
...
{ "certs" : "Cisco CCDA" } -->> { "certs" : "Cisco CCNP " } on :
shard0000 { "t" : 1000, "i" : 198 }
...
What I see from there, is that certs is just one of the values of the
array which could be right, but In that case the same document could
be in different chunks (because 1 document could have an array with
tens certs).

So maybe the issue is around this point.

Cheers,
Sebastian.

axlfu

unread,
Apr 12, 2011, 11:02:14 AM4/12/11
to mongodb-user
Well ,I've test on my computer, it really works.
I mean ,yes, we can make a array be shard key, actually mongoDB does
not know it is a array
For us ,should just remember: one document completely in one chunk,
and one chunk completely in one shard node. The single document could
not be splited

Here is my additional test, no use ,but will make newbie confused

When I define a shard key for example {array.ele:1}, the following
operation will fail
> db.v.save({array:[{ele:1}]})
tried to insert object without shard key


But If I have a collection called w,and has a doc like {array:[ {ele:
1} ]} ,and ensureIndex({array.ele:1})
It's ok when trying to shard collection "w" on key "array.ele", also
save(doc) is ok

My env is :
windows xp
mongoDB 1.8.1

Eliot Horowitz

unread,
Apr 12, 2011, 11:13:00 AM4/12/11
to mongod...@googlegroups.com
That's a bug.
If you have
{ a : [ { b : 1 } ] }
you should be able to shard on a or a.b
can you open a case @ http://jira.mongodb.org/

axlfu

unread,
Apr 12, 2011, 8:13:04 PM4/12/11
to mongodb-user
eh...Eliot, first I have to make sure what the bug is
In my opinion, mongoDB should not allow make an array as shard key,
because there is no useful,right?
And if shard key is set, then save a doc whose shard key is a array is
invalid.

as you say : You cannot shard on certs since its an array.

On 4月12日, 下午11时13分, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> That's a bug.
> If you have
> { a : [ { b : 1 } ] }
> you should be able to shard on a or a.b
> can you open a case @http://jira.mongodb.org/

Eliot Horowitz

unread,
Apr 12, 2011, 8:14:31 PM4/12/11
to mongod...@googlegroups.com
correct

sebarriada

unread,
Apr 12, 2011, 9:25:35 PM4/12/11
to mongodb-user
Thanks a lot for your answer axlfu,

Im not sure if Im getting exactly what you mean. As per your answer,
this should work but the doc cant be splited.
"one document completely in one chunk"

The structure of our document is something like:
{ "_id" : ObjectId("11111111111"), "notes" : [ 90, 95, 92 ],
"description" : "", "certs" : [ "linux", "python", "cisco" ],
"name" :
"Sebastian Arriada", "age" : "30" }

As far as I can see, when we create the shards with certs as index, as
you mention mongoDB does not know about the array and create shards
per the values of "certs". In that case the document could belongs to
1 shard (eg. linux) but will not belongs to the others.

When I run the shard with 500k recors under this situation, after the
shard I had 130k. I though that the previous situation could be a
reason for this.

When you mention that it works, did u try the shard with a collection
with documents? Did u compare the count before/after the sharding?

Cheers,
Sebastian.

sebarriada

unread,
Apr 12, 2011, 10:02:37 PM4/12/11
to mongodb-user
Ahh think that I understand your point from your last message:
"You cannot shard on certs since its an array."

Cool, we will check for other options. Thanks a lot for your help,
thanks to Eliot too!

We have part of the beta ready and we are using mongo in the
background. Hope we can contribute with a success case once we have
the beta ready!

Cheers,
Sebastian.
Reply all
Reply to author
Forward
0 new messages