Rsync and MongoDB data files

1,042 views
Skip to first unread message

TJ Roche

unread,
Jan 20, 2015, 10:45:43 AM1/20/15
to mongod...@googlegroups.com
Hi all,

Our environment is in the process of moving from windows to Linux (ubuntu 14.04) and I have a couple questions around our backup and restore process. 

Our current automated process with windows: 
  1. Take down a member of a replica set
  2. Robocopy the data files off to an external NFS share
  3. bring member back online
  4. stops the target environment
  5. blows away the data directory
  6. Robocopy the backup files to the target machine
  7. starts the mongod process
  8. Initiate the rep set

After attempting to duplicate the process in linux we have run into some interesting problems.  The act of blowing away the data directory and rsyncing the files onto the target machine is painfully slow and takes around 4 hours longer than the robocopy operation. 

Previously we had to blow away the data directory because Mongo doesn't update timestamps on the data files on insert/update/delete and it caused the robocopy to have some issues and caused data corruption.  My question is: is Rsync intelligent enough to do md5 or crc check and copy the relevant data changed in the file or will it still go by timestamp.  If I can only copy deltas, I feel like the operations would be substantially faster.

Linux is brand new territory for me so any help would be greatly appreciated.

Thanks!

Tim Hawkins

unread,
Jan 20, 2015, 11:04:42 AM1/20/15
to mongod...@googlegroups.com

Try

mkdir empty_dir
rsync -a --delete empty_dir/  yourdirectorytodelete/

I think it works faster in most cases as its using all cpu's. rm is single threaded. Its destructivly syncing an empty directory with your directory. It may not work faster in specific cases, but its worth trying.

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/343960d0-0430-467c-ba3b-62e206a24d21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

TJ Roche

unread,
Jan 20, 2015, 12:29:05 PM1/20/15
to mongod...@googlegroups.com
The problem I am having is the rsync to the target server, the deletion is relatively quick.  To put some hard numbers around it, the copy from the NFS share that used to take 2.5 hours is now taking almost 5.  

Tim Hawkins

unread,
Jan 20, 2015, 1:32:15 PM1/20/15
to mongod...@googlegroups.com

Check your nfs protocol version, version 4 which is the latest is known to have some performance issues. The article below is an interesting read, and may be relevant.

http://archive09.linux.com/feature/138453

You can also avoid using nfs by setting up an rsync server on the destination, and using rsync's own protocol.

http://flipthatbit.net/2011/06/stupid-rsync-smart-touch/

Its not cut and dry, but in your circumstances some of the recomendations may be helpfull. I would not run rsync over nfs myself and i have always had performance issues with it in the past. I tend to use bzip2 and scp, but there are some alternatives

http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html

Another alternative is btsync from bittorrent,  especialy if there are multiple destinations. Also have a look at "murder" from twitter, another block distribution xfer protocol. Murder is a bittorrent like protocol that has heavy multithreading and multisource capabilities. Note the name comes from the name for a group of crowes, which are called a "murder of crowes".

We are moving into the need to do daily 200G xfers to distribute some read only search indexes we use, so we have been looking at various schemes for achieving that.



Reply all
Reply to author
Forward
0 new messages