Is it possible to split the output of mongodump into more than one file?

2,243 views
Skip to first unread message

richardhenry

unread,
Jul 4, 2010, 10:21:38 PM7/4/10
to mongodb-user
When dumped our largest collection comes in at around 20GB. Is it
possible to split the dump file into segments of 5GB or less? There is
a limit on the size of files that can be stored on Amazon S3 or
Rackspace Cloud Files -- the two solutions we are currently
considering for backup archival.

Richard

Roger Binns

unread,
Jul 5, 2010, 1:30:29 AM7/5/10
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The dump files are very simple - just the BSON for each collection as a
separate file. You could read in the BSON and split into pieces yourself
before upload, or modify mongodump (it is open source).

Also is your data compressible at all? For my admin tool I use mongodump to
make the dump and then use 7zip to produce an archive. Behind the scenes
7zip uses the LZMA compression method. 7zip is open source and works on
Windows, Linux and Mac. I use this command line which uses the tweaked
lzma2 algorithm, and all CPU cores:

7z -m0=lzma2 -mf=off -mx=9 a out.7z

It does a fantastic job producing a small archive. Some of my data is
uncompressible - zip files stored using gridfs.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwxbfAACgkQmOOfHg372QT1OgCgilKH1y5qZvgF0FtOHxYqhwMZ
OhIAnjw31K5BHs7KLMwwizKOvRL9UmKB
=SHNO
-----END PGP SIGNATURE-----

Tim Hawkins

unread,
Jul 5, 2010, 3:08:19 AM7/5/10
to mongod...@googlegroups.com
You can create an ebs volume, format it with XFS and just store them as tgz files, the s3 2G limit is not present on ebs volumes, its whatever your fs will support.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Windows - The only OS you can buy from TOYSrUS
http://www.toysrus.com/product/index.jsp?productId=3896283

Tim Hawkins
tim.h...@me.com


Tim Hawkins

unread,
Jul 5, 2010, 3:11:51 AM7/5/10
to mongod...@googlegroups.com
PS:

http://coderstalk.blogspot.com/2008/03/create-and-extract-rar-files-in-linux.html

rar, unrar for linux, will create multi file archives with a guaranteed maximum length per part.

Buyung Bahari

unread,
Jul 5, 2010, 4:34:33 AM7/5/10
to mongod...@googlegroups.com
Hi all,

I was spliting my data ini mongodb and become chunks. And this chunks
based on my data range. How i can drop this chunks.


Thanks

buyung_bahari.vcf

Alberto Lerner

unread,
Jul 5, 2010, 8:30:11 AM7/5/10
to mongod...@googlegroups.com
Even if chunked, your data is still part of a collection. So the collection commands apply.
Alberto.


Roger Binns

unread,
Jul 5, 2010, 2:17:47 PM7/5/10
to mongod...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/05/2010 12:11 AM, Tim Hawkins wrote:
> http://coderstalk.blogspot.com/2008/03/create-and-extract-rar-files-in-linux.html
>
> rar, unrar for linux, will create multi file archives with a guaranteed maximum length per part.

Note that rar is not open source and does require payment (see even the
screenshots in that posting) or violating the license.

In any event 7z can also do multi-part archives, as can tar, zip etc. 7z's
compression ratios are way better than rar, which is hardly surprising as it
uses more cpu and memory to do the compression (and rar uses more than zip
getting better compression than that etc).

With the large amounts of data being talked about, you do need a compressor
that will use multiple cores. 7z with LZMA does that (2 cores with LZMA, as
many as you have with LZMA2). If you use the pbzip2 program then it will do
multi-core bzip2.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwyIccACgkQmOOfHg372QQflgCfXF3KbIVi8cfEF7Odw6LL1fBA
QtgAoNVqExBB09aaCo5gfpobezeiLlIX
=9MUB
-----END PGP SIGNATURE-----

scott

unread,
Jul 5, 2010, 5:19:04 PM7/5/10
to mongodb-user
if you are on linux, you can use the "split" command, that is what we
do to upload large files into s3
We do this command after we have tar and gzipped the content

if you are on windows, you need to use something like a 7z or pkzip to
compress and then split into multiple files.

Scott

Buyung Bahari

unread,
Jul 5, 2010, 11:57:36 PM7/5/10
to mongod...@googlegroups.com
Alberto, do you have example of this.


Alberto Lerner wrote:
> Even if chunked, your data is still part of a collection. So the
> collection commands apply.
> Alberto.
>
>
> On Mon, Jul 5, 2010 at 4:34 AM, Buyung Bahari <buyung...@detik.com
> <mailto:buyung...@detik.com>> wrote:
>
> Hi all,
>
> I was spliting my data ini mongodb and become chunks. And this
> chunks based on my data range. How i can drop this chunks.
>
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com

> <mailto:mongod...@googlegroups.com>.


> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com

> <mailto:mongodb-user%2Bunsu...@googlegroups.com>.

buyung_bahari.vcf

Kyle Banker

unread,
Jul 6, 2010, 12:04:54 AM7/6/10
to mongod...@googlegroups.com
If the chunks are in a collection called 'images' then you'd just do from the shell:

db.images.remove();

If it's more complicated than that, just provide us with some more details and we can help.

Buyung Bahari

unread,
Jul 6, 2010, 12:36:28 AM7/6/10
to mongod...@googlegroups.com
Hi kyle, thanks for the answer, but how do i remove based on chunks
range in command db.printShardingStatus().

I'm using sharding, when the data grow up they create a chunk. I need to
remove to save space.

Thanks

Kyle Banker wrote:
> If the chunks are in a collection called 'images' then you'd just do
> from the shell:
>
> db.images.remove();
>
> If it's more complicated than that, just provide us with some more
> details and we can help.
>
> On Mon, Jul 5, 2010 at 11:57 PM, Buyung Bahari
> <buyung...@detik.com <mailto:buyung...@detik.com>> wrote:
>
> Alberto, do you have example of this.
>
>
> Alberto Lerner wrote:
>
> Even if chunked, your data is still part of a collection. So
> the collection commands apply.
> Alberto.
>
>
> On Mon, Jul 5, 2010 at 4:34 AM, Buyung Bahari
> <buyung...@detik.com <mailto:buyung...@detik.com>

> <mailto:buyung...@detik.com


> <mailto:buyung...@detik.com>>> wrote:
>
> Hi all,
>
> I was spliting my data ini mongodb and become chunks. And this
> chunks based on my data range. How i can drop this chunks.
>
>
> Thanks
>
> -- You received this message because you are subscribed
> to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to
> mongod...@googlegroups.com
> <mailto:mongod...@googlegroups.com>

> <mailto:mongod...@googlegroups.com


> <mailto:mongod...@googlegroups.com>>.
>
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> <mailto:mongodb-user%2Bunsu...@googlegroups.com>

> <mailto:mongodb-user%2Bunsu...@googlegroups.com
> <mailto:mongodb-user%252Buns...@googlegroups.com>>.

buyung_bahari.vcf

Alberto Lerner

unread,
Jul 6, 2010, 7:35:52 AM7/6/10
to mongod...@googlegroups.com

I'm using sharding, when the data grow up they create a chunk. I need to remove to save space.

The chunk per se does not create any overhead. In other words, if "the data grow up" in a regular collection, that would take space up too... And, in both cases, remove() erases data from a collection. 

Would you expect otherwise?

Alberto.
Reply all
Reply to author
Forward
0 new messages