Sharding and Shard keys "pitfalls"

526 views
Skip to first unread message

h0lyRS

unread,
Feb 16, 2012, 8:59:05 AM2/16/12
to mongodb-user
I'm trying to understand some of the pitfalls regarding shard keys,
but I can't seem to find it all documented in one place. I've been
collecting some bits and pieces of advice. My personal list so far is
below.

If you see anything wrong, would you please correct it? Thanks!

* Once you have a sharded setup, you can't go back to a single replica
set setup - you'll need to dump/restore to a fresh replica set

* You can't detach all replica sets (shards) from a shard setup - it
must have at least one shard

* Once you assign a shard key for a collection, you can't change it

* If you want a unique index, then that needs to be the shard key.
You can't have any unique indexes other than the shard key. You don't
have to have a unique shard key ( taken from
http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key#comment-209962309
)

* You can use the {unique: true} option to ensure that the underlying
index enforces uniqueness so long as the unique index is a prefix of
the shard key [ Note: I guess this extends the notion described in the
previous bullet, above ]

* If the "unique: true" option is not used, the shard key does not
have to be unique.

* The Sharding FAQ states: "If you don't use _id as the shard key then
it is your responsibility to keep the _id unique". It seems to imply
that once you have any custom shard key not based in _id, the
automatic _id assignment somehow loses uniqueness. That doesn't seem
to be the case as Mongo will still assign automatic _ids [note: I'm
not sure about any of this]

* One good strategy when choosing a shard key for a "chronological"
collection is combining a "coarse-grained time value" (example: month
or year) with a "search criteria" value (example: user email) ( taken
from http://www.snailinaturtleneck.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/
)

* Using the automatic _id as shard key may not be a good idea because
its 4 most significant bytes are unix timestamp-based [ note: could
this work for some scenarios? what about reading? ]

* Cant' update a value where its key is part of the shard key [ note:
TRUE? FALSE? not sure ]

That's all I have for now. I think the most problematic point is the
fact that you can't change a shard key easily. It ends up in a
situation where you can't exactly experiment with different setups
before commiting to one; it is also not very agile for projects that
are constantly changing/pivoting.

Thanks!

Adam C

unread,
Feb 17, 2012, 10:07:52 AM2/17/12
to mongodb-user
Hi there,

Quite a post!

Here's some feedback:

> The Sharding FAQ states: "If you don't use _id as the shard key then
> it is your responsibility to keep the _id unique".

This is basically telling you a couple of things:

1. _id can be specified, by you, if you so wish - the default behavior
is to generate this field for you when you choose to omit it.

If you choose to set _id manually, then it is up to you to make sure
that this is then a unique field. Or you can just let Mongo do it for
you and not worry about it

2. Picking a shard key that uses a manually generated _id field (or
that does not have _id as part of the key) essentially represents a
similar risk, just for a shard key now as opposed to an _id field on a
single collection.

> Using the automatic _id as shard key may not be a good idea because
> its 4 most significant bytes are unix timestamp-based [ note: could
> this work for some scenarios? what about reading? ]

You have kind of answered this yourself, the main problem with a time
based incrementing shard key is that all the writes will go to the
newest chunk, that will only live on one shard so you have a write
hotspot. There is logic in the shard migrations to move the newest
chunk around, but the writes will still stick on one shard at a time.
Reads are less of a problem as long as you queries and data set are
not focused on most recent records. In general, it's not a good idea
and you want to have a coarser field as part of a compound key
instead.

> Can't update a value where its key is part of the shard key

This is correct - shark keys are immutable. This makes sense if you
think about how a mongos will decide where a piece of data is - it
does so based on the shard key. If you were able to change the fields
that make up the shard key in a document then that document would have
to be moved to the appropriate chunk on the appropriate shard or it
would be impossible to find after the change (there is no mechanism
for this).

Adam


On Feb 16, 1:59 pm, h0lyRS <jann...@gmail.com> wrote:
> I'm trying to understand some of the pitfalls regarding shard keys,
> but I can't seem to find it all documented in one place.  I've been
> collecting some bits and pieces of advice.  My personal list so far is
> below.
>
> If you see anything wrong, would you please correct it? Thanks!
>
> * Once you have a sharded setup, you can't go back to a single replica
> set setup - you'll need to dump/restore to a fresh replica set
>
> * You can't detach all replica sets (shards) from a shard setup - it
> must have at least one shard
>
> * Once you assign a shard key for a collection, you can't change it
>
> * If you want a unique index, then that needs to be the shard key.
> You can't have any unique indexes other than the shard key.  You don't
> have to have a unique shard key ( taken fromhttp://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key#comment-2099...
> )
>
> * You can use the {unique: true} option to ensure that the underlying
> index enforces uniqueness so long as the unique index is a prefix of
> the shard key [ Note: I guess this extends the notion described in the
> previous bullet, above ]
>
> * If the "unique: true" option is not used, the shard key does not
> have to be unique.
>
> * The Sharding FAQ states: "If you don't use _id as the shard key then
> it is your responsibility to keep the _id unique". It seems to imply
> that once you have any custom shard key not based in _id, the
> automatic _id assignment somehow loses uniqueness. That doesn't seem
> to be the case as Mongo will still assign automatic _ids [note: I'm
> not sure about any of this]
>
> * One good strategy when choosing a shard key for a "chronological"
> collection is combining a "coarse-grained time value" (example: month
> or year) with a "search criteria" value (example: user email) ( taken
> fromhttp://www.snailinaturtleneck.com/blog/2011/01/04/how-to-choose-a-sha...

Ted Chyn

unread,
Aug 17, 2014, 12:50:57 PM8/17/14
to mongod...@googlegroups.com, jan...@gmail.com
From mongodb manual, are following statements are true ? Please comments if I am in erros.

1. If a automatically generated  mongo _id  unique ID index.  I cannot created a unique shard index(?)  but this shard index{x:1}  still can be use to enforce that no 2 documents can have the shard key ?  (True or False)

2. use above scenario,  _id index is created as unique index and shard index is another field x.   here are some thing I observed

  a.   if doc {x:1} already exists,  shard index will prevent another doc {x:1} to be inserted.
    however, when I can insert doc (x:1, y:2}  with no issue.
    is this normal ?
  b.  I did not see last error message when duplicate doc {x:1} is inserted even insertion did not occur on the database.
        is the normal ?  

thnx ted

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The unique constraint on indexes ensures that only one document can have a value for a field in acollection. For sharded collections these unique indexes cannot enforce uniqueness because insert and indexing operations are local to each shard.

MongoDB does not support creating new unique indexes in sharded clusters and will not allow you to shard collections with unique indexes on fields other than the _id field.

Asya Kamsky

unread,
Aug 21, 2014, 4:20:45 AM8/21/14
to mongodb-user

I'm not sure I understand your questions but you seem to be misinterpreting some things you see.

Inline:


On Aug 17, 2014 9:51 AM, "Ted Chyn" <ted....@gmail.com> wrote:
>
> From mongodb manual, are following statements are true ? Please comments if I am in erros.
>
> 1. If a automatically generated  mongo _id  unique ID index.  I cannot created a unique shard index(?)  but this shard index{x:1}  still can be use to enforce that no 2 documents can have the shard key ?  (True or False)

I don't understand the question at all.   Every collection must have unique index on _id and mongo enforces that.   You can shard on x but index x:1 can be unique or it can be non-unique, either is allowed.

> 2. use above scenario,  _id index is created as unique index and shard index is another field x.   here are some thing I observed

You don't create _id index, its always there and always unique.

You didn't specify if x:1 is a unique index.

>   a.   if doc {x:1} already exists,  shard index will prevent another doc {x:1} to be inserted.

Untrue.
Only true if x:1 is unique index.

>     however, when I can insert doc (x:1, y:2}  with no issue.
>     is this normal ?

Since above statement was incorrect, this isn't "normal" or "not normal" - without unique index on x you can insert as many documents with same x value as you want.

>   b.  I did not see last error message when duplicate doc {x:1} is inserted even insertion did not occur on the database.
>         is the normal ?  

Insertion occurred.   If it didn't then you would have seen an error message.

It looks like you replied to a rather old messages - it's better to start a new thread and include exact part of the docs that you find unclear.

The following discussion is about unique indexes on sharded collections when these indexes are NOT the shard key.

> The unique constraint on indexes ensures that only one document can have a value for a field in a collection. For sharded collections these unique indexes cannot enforce uniqueness because insert and indexing operations are local to each shard.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user"
> group.
>  
> For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/7d146878-fb9f-415d-886b-487e62ce435e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages