Sharding GridFS collection

243 views
Skip to first unread message

Yaroslav

unread,
Jun 30, 2011, 5:00:19 AM6/30/11
to mongodb-user
Hello everyone.

I am trying to make sharding on (n,_id) key on chunks collection. But
i get an error message when adding new file through C# client
something like 'Currently supporting only sharding on _id key'

It is obviously that sharding on (_id) key creates a hotspot, cause
all insert writes are directed on the same shard.

I am curious, is it MongoDB engine restriction or just C# client? What
workaround exists with GridFS sharding to avoid creating hotspot?

Kyle Banker

unread,
Jun 30, 2011, 1:17:53 PM6/30/11
to mongodb-user
Which version of MongoDB are you running?

What happens when you call the shardCollection command?

Can you paste in the result of:

db.printShardingStatus()

Yaroslav

unread,
Jul 1, 2011, 6:54:06 AM7/1/11
to mongodb-user
1)We use 1.8.1 version.
2) shardCollection runs well without any error.
3)
db.printShardingStatus()

--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
{ "_id" : "sh1", "host" : "sh1/andorra.softserveinc.com:27031" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" :
"config" }
{ "_id" : "Documents", "partitioned" : true, "primary" :
"sh1" }
Documents.fs.chunks
chunks: sh1 3
{ "n" : { $minKey : 1 }, "files_id" :
{ $minKey : 1 } } -->> { "n" : 0, "files_id" :
ObjectId("4e0da2cd0cacdf741a38ed9f") } on : sh1 { "t" : 1000, "i" :
1 }
{ "n" : 0, "files_id" :
ObjectId("4e0da2cd0cacdf741a38ed9f") } -->> { "n" : 84, "files_id" :
ObjectId("4e0da2cd0cacdf741a38ed9f") } on : sh1 { "t" : 1000, "i" :
3 }
{ "n" : 84, "files_id" :
ObjectId("4e0da2cd0cacdf741a38ed9f") } -->> { "n" : { $maxKey : 1 },
"files_id" : { $maxKey : 1 } } on : sh1 { "t" : 1000, "i" : 4 }

4) mongofiles.exe gives "assertion: 9008 filemd5 failed"
c# driver gives - GridFS chunks collection can only be sharded on
files_id

Current issue is reflected in this thread:
http://groups.google.com/group/mongodb-user/browse_thread/thread/61484e9bb75abe57


But question is how we could avoid creating Insert hotspot node if we
allowed only sharding GridFS collections on file_id which is
monotonically increasing?

Kyle Banker

unread,
Jul 1, 2011, 10:33:45 AM7/1/11
to mongodb-user
The problem here is that the filemd5 command, which performs a server-
side md5 against all chunks, currently requires that each chunk live
on the same shard. The only way to guarantee that is to shard the
chunks collection on files_id. That's why you're seeing the error
message.

The advantage of keeping all chunks on the same shard is that queries
will be fast. By contrast, if you shard this collection on (n, _id),
then it's quite possible that the chunks for a single file will live
on multiple shards, and this is bad for locality (and bad for reads).

The disadvantage is that you could see a hot spot while inserting. Are
you in fact seeing this? This system should balance pretty well. If
it's a real problem, then one solution would be to use a custom
files_id. If you create a new md5 for each files_id, then you should
get a random distribution on inserts while still maintaining the
benefits of keeping all chunks on the same shard.
> Current issue is reflected in this thread:http://groups.google.com/group/mongodb-user/browse_thread/thread/6148...

Jens Rantil

unread,
Jul 3, 2011, 5:25:25 AM7/3/11
to mongodb-user
Maybe offtopic here, but what you are saying is that GridFS files can
only be as large as one machine can handle? Isn't that a pretty big
limitation?

Kyle Banker

unread,
Jul 3, 2011, 12:52:40 PM7/3/11
to mongodb-user
No. I'm saying that if you're using GridFS with sharding, then you
want all chunks from any given file to live on a single machine.
Sharding on files_id guarantees this.

Yaroslav

unread,
Jul 4, 2011, 2:41:14 PM7/4/11
to mongodb-user
Do we really have possibility to influence files_id? I thought GridFS
hides from us _id of fs.files collection. Can we?

On Jul 3, 7:52 pm, Kyle Banker <k...@10gen.com> wrote:
> No. I'm saying that if you're using GridFS withsharding, then you
> want all chunks from any given file to live on a single machine.Shardingon files_id guarantees this.
> > > >  shardingversion: { "_id" : 1, "version" : 3 }
> > > > allowed onlyshardingGridFS collections on file_id which is
> > > > monotonically increasing?
>
> > > > On Jun 30, 8:17 pm, Kyle Banker <k...@10gen.com> wrote:
>
> > > > > Which version of MongoDB are you running?
>
> > > > > What happens when you call the shardCollection command?
>
> > > > > Can you paste in the result of:
>
> > > > > db.printShardingStatus()
>
> > > > > On Jun 30, 5:00 am, Yaroslav <yaroslav.n...@gmail.com> wrote:
>
> > > > > > Hello everyone.
>
> > > > > > I am  trying to makeshardingon (n,_id) key on chunks collection. But
> > > > > > i get an error message when adding new file through C# client
> > > > > > something like 'Currently supporting onlyshardingon _id key'
>
> > > > > > It is obviously thatshardingon (_id) key creates a hotspot, cause

Robert Stam

unread,
Jul 4, 2011, 3:08:23 PM7/4/11
to mongodb-user
You can choose your own files_id if you want.

The easy way to upload a file to GridFS is:

database.GridFS.Upload("filename"); // _id assigned for you

But you can also have more control if you do it this way:

using (Stream stream = File.OpenRead("localfilename")) {
var createOptions = new MongoGridFSCreateOptions { Id =
1 }; // any _id you want
database.GridFS.Upload(stream, "remotefilename",
createOptions);
Reply all
Reply to author
Forward
0 new messages