Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

recursively tar the files in a directory

26 views
Skip to first unread message

zhengquan

unread,
May 7, 2008, 6:40:43 PM5/7/08
to
Hello,
I want to recursively tar the files that are bigger than 100M in a
directory and delete the original files.
Can any one give me a hint how to do it?

Thanks for your help!

Regards,
Zhengquan

mo

unread,
May 7, 2008, 8:47:27 PM5/7/08
to
On Wed, 07 May 2008 19:40:43 -0300, zhengquan <zhang.z...@gmail.com>
wrote:

Maximum compression with bzip2:
find /dir -type f -size +100M|tar cjvf /tmp/file.tar.bz2 -T -

To list content:
tar tjvf /tmp/file.tar.bz2

Then, if ok, delete files is easy.

Dave B

unread,
May 8, 2008, 4:56:37 AM5/8/08
to

If you have GNU find, xargs and tar, you can do something like

find /srcdir -type f -size +100M -print0 | xargs -r0 \
tar --remove-files -cSjvf tarfile.tbz2

I suggest you try first without the "--remove-files" option.

--
D.

Message has been deleted
Message has been deleted

zhengquan

unread,
May 8, 2008, 3:47:00 PM5/8/08
to

I was not clear in the original post, Can I have a separate tarball
for each file?

Thanks!
Zhengquan

zhengquan

unread,
May 8, 2008, 5:28:27 PM5/8/08
to
On May 8, 3:56 am, Dave B <da...@addr.invalid> wrote:

Can I have the individual files tarred to separate tarballs in the
same directory as the original ones?

Thanks!
Zhengquan

Message has been deleted

mallin.shetland

unread,
May 8, 2008, 8:04:19 PM5/8/08
to
zhengquan scrisse:

> I want to recursively tar the files that are bigger than 100M in a
> directory and delete the original files.

find /srcdir -type f -size +100M -print0 |

tar -cjf archive.tar.bz2 --remove-files --null -T -

It works for GNU tar, don't ask for others.

mallin.shetland

unread,
May 8, 2008, 8:08:24 PM5/8/08
to
Dave B scrisse:

> find /srcdir -type f -size +100M -print0 | xargs -r0 \
> tar --remove-files -cSjvf tarfile.tbz2

If you have more than one or two hundreds of files
you repeatly overwrite tarfile.tbz2 again and again
and more again; very smart :(

mallin.shetland

unread,
May 8, 2008, 8:20:17 PM5/8/08
to
zhengquan scrisse:

> Can I have the individual files tarred to separate tarballs?

Yes


find $dir -type f -size +100M -exec tar --remove-files -cjf {}.tar.bz2 {} \;


Dave B

unread,
May 9, 2008, 3:57:04 AM5/9/08
to

Correct. That's why I suggested the OP to try the command without
the --remove-files option first.

--
D.

Teo

unread,
May 9, 2008, 4:37:15 AM5/9/08
to
Hi

> > > Hello,
> > > I want to recursively tar the files that are bigger than 100M in a
> > > directory and delete the original files.
> > > Can any one give me a hint how to do it?
>

> > If you have GNU find, xargs and tar, you can do something like
>
> > find /srcdir -type f -size +100M -print0 | xargs -r0 \
> > tar --remove-files -cSjvf tarfile.tbz2
>
> > I suggest you try first without the "--remove-files" option.
>

> Can I have the individual files tarred to separate tarballs in the
> same directory as the original ones?

wait: you want to make tar archivea with one file? Or do you just want
to compress large files in your directory tree?

Matteo

zhengquan

unread,
May 9, 2008, 10:41:52 AM5/9/08
to

Thank you, that is exactly what I had hoped.
Zhengquan

zhengquan

unread,
May 9, 2008, 10:43:20 AM5/9/08
to

The simulation data is too huge so I just want to compress them to the
original directory and delete the original huge files.

Zhengquan

Teo

unread,
May 9, 2008, 11:22:09 AM5/9/08
to

Ok but in this case just compress the file (gzip) instead of creating
a tar file with one file and compress it.

Matteo

Message has been deleted

mo

unread,
May 9, 2008, 11:39:45 AM5/9/08
to
On Fri, 09 May 2008 11:43:20 -0300, zhengquan <zhang.z...@gmail.com>
wrote:

> The simulation data is too huge so I just want to compress them to the
> original directory and delete the original huge files.
>
> Zhengquan

Ah ok,
for that I think the best solution is something as:
find $dir -type f -size +100M -exec bzip2 {} \;

See:
bzip2 --help

Consider your target, compression time or size file.

zhengquan

unread,
May 9, 2008, 11:43:46 AM5/9/08
to

Thanks, is gzip more time efficient to compress the files? I know
bzip2 has small sizes.

Zhengquan

Stephane CHAZELAS

unread,
May 9, 2008, 11:48:39 AM5/9/08
to
2008-05-09, 12:39(-03), mo:

> On Fri, 09 May 2008 11:43:20 -0300, zhengquan <zhang.z...@gmail.com>
> wrote:
>
>> The simulation data is too huge so I just want to compress them to the
>> original directory and delete the original huge files.
[...]

> for that I think the best solution is something as:
> find $dir -type f -size +100M -exec bzip2 {} \;
[...]

Note that you don't need to execute a bzip2 for each file, as
bzip2 can compress more than one file at once.

You forgot to quote $dir.

Note that the above assumes that $dir doesn't start with "-".

Note that the M in +100M is not standard.

K=1024 M=$((1024*$K)) G=$((1024*$M))
find "$dir" -type f -size +"$((100*$M))" -exec bzip2 {} +

With zsh:

bzip2 ./**/*(D.LM+100)

--
Stéphane

zhengquan

unread,
May 9, 2008, 12:00:08 PM5/9/08
to
On May 9, 10:39 am, mo <inva...@mail.address> wrote:
> On Fri, 09 May 2008 11:43:20 -0300, zhengquan <zhang.zhengq...@gmail.com>

> wrote:
>
> > The simulation data is too huge so I just want to compress them to the
> > original directory and delete the original huge files.
>
> > Zhengquan
>
> Ah ok,
> for that I think the best solution is something as:
> find $dir -type f -size +100M -exec bzip2 {} \;
>
> See:
> bzip2 --help
>
> Consider your target, compression time or size file.

Well, it seems that there is no --remove-files mechanism in gzip and
bzip2, so I can only use find $dir -type f -size +100M -exec rm -i {}
\; to delete the orginal files. I added -i in case it deletes files I
want to keep.

Thanks.
Zhengquan

Zhengquan

Dave B

unread,
May 9, 2008, 12:08:24 PM5/9/08
to
On Friday 9 May 2008 18:00, zhengquan wrote:

>> Ah ok,
>> for that I think the best solution is something as:
>> find $dir -type f -size +100M -exec bzip2 {} \;
>>
>> See:
>> bzip2 --help
>>
>> Consider your target, compression time or size file.
>
> Well, it seems that there is no --remove-files mechanism in gzip and
> bzip2, so I can only use find $dir -type f -size +100M -exec rm -i {}
> \; to delete the orginal files. I added -i in case it deletes files I
> want to keep.

AFAIK, bzip2 operates directly on the file, so no need to remove it
afterwards (you would get an error it you tried).

$ ls foo
file1 file2
$ find foo -type f -exec bzip2 {} \;
$ ls foo
file1.bz2 file2.bz2

As Stéphane already mentioned, you can use + instead of \; to terminate the
bzip2 command, to get an increase in efficiency.

--
D.

zhengquan

unread,
May 9, 2008, 12:16:09 PM5/9/08
to

I was just thinking of how to remove the original file. Now it seems
there is no need for that.
Thanks!
Zhengquan

zhengquan

unread,
May 9, 2008, 12:33:02 PM5/9/08
to
On May 9, 10:48 am, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
> 2008-05-09, 12:39(-03), mo:> On Fri, 09 May 2008 11:43:20 -0300, zhengquan <zhang.zhengq...@gmail.com>

> > wrote:
>
> >> The simulation data is too huge so I just want to compress them to the
> >> original directory and delete the original huge files.
> [...]
> > for that I think the best solution is something as:
> > find $dir -type f -size +100M -exec bzip2 {} \;
>
> [...]
>
> Note that you don't need to execute a bzip2 for each file, as
> bzip2 can compress more than one file at once.
>
> You forgot to quote $dir.
>
> Note that the above assumes that $dir doesn't start with "-".
>
> Note that the M in +100M is not standard.
>
> K=1024 M=$((1024*$K)) G=$((1024*$M))
> find "$dir" -type f -size +"$((100*$M))" -exec bzip2 {} +
>
> With zsh:
>
> bzip2 ./**/*(D.LM+100)
>
> --
> Stéphane

Thanks, Stéphane, on my centos 4 machine there is no M option for the
size, so I use k. I was just using 100M as a rough divider for large
and small files. I wonder if there is any efficiency difference
bewtween using 100M and exponents of 2.

Zhengquan

Teo

unread,
May 15, 2008, 12:56:21 AM5/15/08
to

Hi,

no bzip is fine, I was just wandering about the tar non the
compression :-)

Matteo

0 new messages