using replication to off load work and then gather it

2 views

Skip to first unread message

Nitin Borwankar

unread,

Feb 10, 2012, 4:15:56 PM2/10/12

to mongodb-user

In an effort to parallelize loads and indexing I am considering the
following and would like the community's input on whether it makes
sense.

We are talking about collections of size ~ 200G and 300+ Mill
documents. And with 5-6 indexes.
So the fact that indexing ties up the db for a day or so per index
makes it necessary to think of other ways if possible.

Just doing this on a mirrored box and let it run for days is not an
option, for reasons too complicated to go into here.

Also the data is a huge reference database which once loaded will not
be written to - all access will be read only.

My thoughts:-

* Load up data on separate machines, different collection on each
machine.
* Index them as needed per collection on each box separately, but in
parallel (note one collection per box)
* Then replicate the collections to a single central database, one
collection at a time from each box.

Question is will this result in a consistent database and will indexes
replicate over or will I have to re-index.
If latter then the effort is pointless.

Thanks for any help.

Eliot Horowitz

unread,

Feb 11, 2012, 10:12:10 PM2/11/12

to mongod...@googlegroups.com

If you replicate indexes would have to be done again.

Can this be done semi offline?

If so, you can build each one on a different machine in a different db name.

Then shut all mongod down, copy all data files to one machine, and you
can access everything.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Reply all

Reply to author

Forward

0 new messages