Help. Choosing shard keys

38 views
Skip to first unread message

Raúl González

unread,
Jun 20, 2012, 10:53:56 AM6/20/12
to mongodb-user
Hi to all,

I hope that from the group could help me to choose my shard keys. I
have two models that i want to shard.

The first model is a word list and it only have two fields into the
model: Language as char field and word as chard field also. I think
the best solution is to choose the field word for sharding here.

The second modelo is a book list and this have the following fields:

id, created_at, title, publisher, publishdate, source. Here I have
many doubts with the best choice.

I hope that you could help me because as the documentation said "It is
important to choose the right shard key for a collection. If the
collection is gigantic it is difficult to change the key later"

Thanks in advance and best regards.

Max Schireson

unread,
Jun 20, 2012, 11:03:35 AM6/20/12
to mongod...@googlegroups.com

How will you most frequently query the data?

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Raúl González

unread,
Jun 20, 2012, 11:11:05 AM6/20/12
to mongodb-user
Sorry Max i forgot it and it is important. Sorry.

The querys are simple:

In the Word model there are only queries by word. And in the books
table there are two queries type: by created_at and by id, because in
the middle there are sphinx server.

Thanks in advance Max.
Best Regards.

On 20 jun, 17:03, Max Schireson <mschire...@gmail.com> wrote:
> How will you most frequently query the data?

Max Schireson

unread,
Jun 20, 2012, 11:29:09 AM6/20/12
to mongod...@googlegroups.com

Word defininitely by word.

For books can you share:

1. Frequency of insert/update
2. Relative frequency of queries on created_at and id and query volume overall
3. Data distribution: how many values of created at are there, how many results do you expect queries to return, etc

Thanks,

-- Max

Raúl González

unread,
Jun 20, 2012, 11:48:42 AM6/20/12
to mongod...@googlegroups.com
Thanks so much Max,

Ok in Words we agree.

In Books the frecuency of queries by id will be higher that frecuency of queries by created_at. Queries by created_at are for export to Sphinx server only whereas that by id will be for application directly.

The queries returned twenty results each and created_at will be daily.

What key you think as the best candidate? 


Raúl González Martínez.
Founder & CEO.
Diffindo Software & Solutions
raul.g...@diffindo.es



2012/6/20 Max Schireson <mschi...@gmail.com>

Max Schireson

unread,
Jun 20, 2012, 12:25:53 PM6/20/12
to mongod...@googlegroups.com

Id sounds like the right bet.

How you assign your ids (random/increasing/mixture matters too.

How many are you adding each day/second? How big total is this collection?

-- Max

Raúl González

unread,
Jun 20, 2012, 12:47:59 PM6/20/12
to mongod...@googlegroups.com
I assign id´s natively on mongodb.

Every day many rows will be inserted (every hour could be inserted 50 rows)  and the collection could be sized on the order of GB.

Thanks Max again.

Max Schireson

unread,
Jun 20, 2012, 12:59:35 PM6/20/12
to mongod...@googlegroups.com

The native id assignment should be fine then.

Raúl González

unread,
Jun 20, 2012, 1:42:43 PM6/20/12
to mongod...@googlegroups.com

Thank so much Max I'll pay attention you. I'll use id as shard key in this table.

because to use a compund key is possible? i dont know. And it could be a better solution that id only?

Thanks a lot.

Max Schireson

unread,
Jun 20, 2012, 3:06:09 PM6/20/12
to mongod...@googlegroups.com

If you gave Sphinx the created_at as well as the id you could use a compound key of created_at, id for both queries.

But if feeding sphinx is a batch job that runs daily and has to pull the whole db, it may not help much to try do get each block of that to execute on a single server, and the sphinx indexes would likely get bigger so I am trying to be practical and optimize your system overall.

Can you explain a bit more about how you feed the extract to sphinx?

-- Max

Raúl González

unread,
Jun 21, 2012, 4:24:58 AM6/21/12
to mongod...@googlegroups.com
Hi Max,

No, the created_at has not gave to Sphinx, only id I gave to Sphinx. The created_at is used only to extract the rows from mongodb.

The integration between mongodb and Sphinx is made with main-delta indexes going to Sphinx the id and the title only.

Thanks a lot Max.
Reply all
Reply to author
Forward
0 new messages