How to setup a Backup / Reporting Slave for a Sharded Cluster

108 views
Skip to first unread message

Darshan Shah

unread,
Nov 18, 2015, 10:07:53 AM11/18/15
to mongodb-user
I have a 10 shard cluster with each shard being a 3 member replicaset.
Is it possible to setup a (possibly delayed) slave having all the data from all the shards?
This can be used for running long queries for reporting purposes and possibly may also serve as a backup.

Stephen Steneker

unread,
Nov 26, 2015, 11:58:26 PM11/26/15
to mongodb-user

On Thursday, 19 November 2015 02:07:53 UTC+11, Darshan Shah wrote:

I have a 10 shard cluster with each shard being a 3 member replicaset.
Is it possible to setup a (possibly delayed) slave having all the data from all the shards?

Hi Darshan,

What version of MongoDB are you using? Are you currently having any issues running your reporting queries against the existing sharded cluster, or are you planning for future scaling or usage segregation?

A node can only be a member of a single replica set. If you want to sync all the data from a sharded cluster to a separate deployment you will need to look at a sync solution such as mongo-connector.

Given you have 10 shards, that sounds like a potentially significant challenge to replicate to a single server.

This can be used for running long queries for reporting purposes

Depending on your reporting requirements, there may be better ways to approach long queries. For example, common approaches include using pre-aggregated reports and incremental data updates to reduce unnecessary re-aggregation of data.

and possibly may also serve as a backup

Each of your replica sets already provides data redundancy and failover. However, having a full copy of the data does not provide a backup strategy in the event you need to restore data from a previous point in time. I would look into a more complete backup solution like MongoDB Cloud Manager, which can take cluster-wide snapshots based on data retention policies (i.e. how often to capture snapshots and how long to store daily/weekly/monthly snapshots).

Regards,
Stephen

Darshan Shah

unread,
Nov 27, 2015, 1:05:40 PM11/27/15
to mongodb-user
Hi Stephen,

I am running MongoDb 3.0.6 MMAP (not WiredTiger) and there are no issues as such.
You are right - this is more from a future planning perspective, usage segregation as well as a third level backup which can be used for read-only purposes by the application.

Mongo Connector seems very interesting - thanks for pointing it out.

Thanks!

Darshan Shah

unread,
Feb 22, 2016, 12:32:18 PM2/22/16
to mongodb-user
Hi Stephen,

I finally got around to checking out Mongo-Connector - it is very good.
However, it is very heavy in sense that it takes quite some time to do the initial sync and requires a target MongoDb (possibly replicaset) instance.

Is there any other way to get a continuous hot backup from MongoDb other than the File System Backup as mentioned in the MongoDb backup of sharded cluster with filesystem snapshots ?


Thanks


On Thursday, 26 November 2015 23:58:26 UTC-5, Stephen Steneker wrote:

Stephen Steneker

unread,
Feb 24, 2016, 11:35:55 AM2/24/16
to mongodb-user
On Monday, 22 February 2016 09:32:18 UTC-8, Darshan Shah wrote:
I finally got around to checking out Mongo-Connector - it is very good.
However, it is very heavy in sense that it takes quite some time to do the initial sync and requires a target MongoDb (possibly replicaset) instance.

Is there any other way to get a continuous hot backup from MongoDb other than the File System Backup as mentioned in the MongoDb backup of sharded cluster with filesystem snapshots ?

Hi Darshan,

Any solution for continuously backing up your sharded cluster will necessarily involve some sort of "initial sync" phase in order to create an starting snapshot of the data. For example, the first filesystem or EBS snapshot will take longer than subsequent snapshots. Supported backup methods are described in the MongoDB manual: Backup and Restore Sharded Clusters.

Mongo Connector is intended as a solution to sync with external systems (for example, a search engine or separate MongoDB deployment), but typically isn't used to backup sharded clusters. Normally a backup strategy would include considerations such as multiple generations of backups (hourly/daily/weekly depending on your requirements) and a straightforward way to restore a backup to your production environment. As a result, your storage solution for backups will generally be significantly larger than the original deployment.

The easiest solution for continuous backup of a sharded cluster would be MongoDB Cloud Manager (paid cloud service) or MongoDB Ops Manager (on-premise product that is part of a MongoDB Enterprise Advanced subscription).

Regards,
Stephen
Reply all
Reply to author
Forward
0 new messages