We back up our Mongo databases to S3 every night. The basic flow is (approximately):
mongodump -o tempdir
tar cfpj dump.tar tempdir
s3_multipart_upload.py dump.tar ....
The problem is, this copies the data several times. The dump creates files on disk, then tar reads those files and writes a tar file out. Then s3_multipart_upload splits that into smaller chunks on disk, and only then does the data get copied into S3. We're moving something like 300 GB every night, so this is a drag.
Has anybody rolled a cleaner version of this? Maybe a version of mongodump which knows how to write directly to S3, without the need for the intermediate disk files?