Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

tar to multiple files

0 views
Skip to first unread message

fabio vassalli

unread,
Sep 9, 2003, 2:49:43 PM9/9/03
to
hello,

i'd like to use tar to crate an archive. if I would crate a single
tar-file, id would have about 4 GB. it would be easyer to handle it when
the archive would be distributed to four 1 GB tar files.

It is not a solution to create first a big tar-file and to split it
therafter, because I want to access the content of each file directly.
(In fact, my tar-command refuses to unpack (but not to pack) tar-files
larger than 2 GB, even so I'm using reiserfs).

I have tried the tar -L option, but it needs a manual action after each
file, and each output-file gets the same name, and I would like to use
the command in a script (for cron.daily).

There is also an tar -M option, but I dont see what to do with.

How to get tar to creat several tar-archives with limited file size and
consecutive names on the same disk and without manual interaction?

Of course I have read all the net and the man pages, but I did not get it.

thanks for help

fabio vassalli

Dances With Crows

unread,
Sep 9, 2003, 3:04:57 PM9/9/03
to
On Tue, 09 Sep 2003 20:49:43 +0200, fabio vassalli staggered into the
Black Sun and said:
> i'd like to use tar to crate an archive.

Sorry, you need /usr/bin/crate to "crate" an archive :-) .

> if I would crate a single tar-file, id would have about 4 GB. it would
> be easyer to handle it when the archive would be distributed to four 1
> GB tar files.

tar cO /path/to/place/to/be/tarred | split --bytes=1024m - PREFIX

...creates files named PREFIXaa , PREFIXab , ... PREFIXzz , all 1024M
(1G) or less in size. To extract these,

cat PREFIX* | tar xO

> (In fact, my tar-command refuses to unpack (but not to pack) tar-files
> larger than 2 GB, even so I'm using reiserfs).

Your /bin/tar was compiled without _FILE_OFFSET_BITS_64 #defined, then.
Recompile your tar or get an updated package from your distro's site.

> How to get tar to creat several tar-archives with limited file size
> and consecutive names on the same disk and without manual interaction?
> Of course I have read all the net and the man pages, but I did not get
> it.

tar's man page is useless and its info page is too scattered for
novices. Think like a Unix user (use pipes, redirection, and external
utilities), notice that tar can read and write to stdout, and the
solution is easy. HTH,

--
Matt G|There is no Darkness in Eternity/But only Light too dim for us to see
Brainbench MVP for Linux Admin /
http://www.brainbench.com / "He is a rhythmic movement of the
-----------------------------/ penguins, is Tux." --MegaHAL

peter pilsl

unread,
Sep 9, 2003, 3:47:32 PM9/9/03
to
Dances With Crows wrote:

>
> tar cO /path/to/place/to/be/tarred | split --bytes=1024m - PREFIX
>

dont think this is what the OP wanted. The OP asked for a solution where he
can access each file directly, which is a very good idea, cause otherwise
the loss or damage of one file will make him looses all his backup.

I would try to use the -L option and write a wrapper surround that renames
the created file and makes tar continue. (never did it).

best,
peter

--
peter pilsl
pilsl_...@goldfisch.at
http://www.goldfisch.at

Peter T. Breuer

unread,
Sep 9, 2003, 3:53:02 PM9/9/03
to
fabio vassalli <fabio.v...@bluewin.ch> wrote:
> i'd like to use tar to crate an archive.

Sounds an easy ambition to achieve. Let's read on.

> if I [created] a single
> tar-file, [it] would [be] about 4 GB. it would be [easier] to handle [if]
> the archive [were] distributed to four 1 GB tar files.

> It is not a solution to create first a big tar-file and to split it
> therafter, because I want to access the content of each file directly.

OK. Well, you'll have to solve the bin-packing problem then. Sorry, but
it's equivalent to the travelling salesman problem. And that's
NP-complete.

I'm afraid you are mistaken. You have *direct* access to each file even
when the archive is split into 4 arbitrarily. You do

cat foo.tar.[1-4] | tar xvf - my/file/name

> (In fact, my tar-command refuses to unpack (but not to pack) tar-files
> larger than 2 GB, even so I'm using reiserfs).

That's because you are doing it wrong. You want

tar xvf - < foo.tar

> I have tried the tar -L option,

I wonder what it does!

-L, --tape-length N
change tapes after writing N*1024 bytes

Bwaahahhahaaaaaaa.


> but it needs a manual action after each
> file, and each output-file gets the same name,

So what? Write a tiny (expect) script.


> and I would like to use
> the command in a script (for cron.daily).

Nothing is holding you back except yourself.

> There is also an tar -M option, but I dont see what to do with.

It's what you want.

-M, --multi-volume
create/list/extract multi-volume archive

I quote.
***************************************************************

Use `--multi-volume' (`-M') on the command line, and then `tar'
will,
when it reaches the end of the tape, prompt for another tape, and
continue the archive. Each tape will have an independent archive, and
can be read without needing the other. (As an exception to this, the
file that `tar' was archiving when it ran out of tape will usually be
split between the two archives; in this case you need to extract from
the first archive, using `--multi-volume' (`-M'), and then put in the
second tape when prompted, so `tar' can restore both halves of the
file.)

GNU `tar' multi-volume archives do not use a truly portable format.
You need GNU `tar' at both end to process them properly.

When prompting for a new tape, `tar' accepts any of the following
responses:

`?'
Request `tar' to explain possible responses

`q'
Request `tar' to exit immediately.

`n FILE NAME'
Request `tar' to write the next volume on the file FILE NAME.

`!'
Request `tar' to run a subshell.

`y'
Request `tar' to begin writing the next volume.

(You should only type `y' after you have changed the tape; otherwise
`tar' will write over the volume it just finished.)

If you want more elaborate behavior than this, give `tar' the
`--info-script=SCRIPT-NAME' (`--new-volume-script=SCRIPT-NAME', `-F
SCRIPT-NAME') option. The file SCRIPT-NAME is expected to be a program
(or shell script) to be run instead of the normal prompting procedure.
When the program finishes, `tar' will immediately begin writing the
next volume. The behavior of the `n' response to the normal
tape-change prompt is not available if you use
`--info-script=SCRIPT-NAME' (`--new-volume-script=SCRIPT-NAME', `-F
SCRIPT-NAME').

The method `tar' uses to detect end of tape is not perfect, and
fails on some operating systems or on some devices. You can use the
`--tape-length=1024-SIZE' (`-L 1024-SIZE') option if `tar' can't detect
the end of the tape itself. This option selects `--multi-volume'
(`-M') automatically. The SIZE argument should then be the usable size
of the tape. But for many devices, and floppy disks in particular,
this option is never required for real, as far as we know.

***************************************************************


so I would assume that -L 1000000 is what you really want (more or
less). And you want -F myscript, where myscript is something that
does


for i in 1 2 3 4 5; do [ ! -e $FILENAME.$i ] && break; done
[ $i -gt 4 ] && { echo q ; exit; }
echo "n $FILENAME.$i"
echo y


> How to get tar to creat several tar-archives with limited file size and
> consecutive names on the same disk and without manual interaction?

Have you looked at the manual, as I just have?

> Of course I have read all the net and the man pages, but I did not get it.

Why not? What did you not see that I saw?

Peter

Dances With Crows

unread,
Sep 9, 2003, 4:01:06 PM9/9/03
to
On Tue, 9 Sep 2003 21:47:32 +0200, peter pilsl staggered into the Black
Sun and said:
> Dances With Crows wrote:
>> tar cO /path/to/place/to/be/tarred | split --bytes=1024m - PREFIX
>
> dont think this is what the OP wanted. The OP asked for a solution
> where he can access each file directly, which is a very good idea,
> cause otherwise the loss or damage of one file will make him looses
> all his backup.

If the OP wants something like that, he should use something other than
tar (cpio and afio can create multiple-volume sets like what you
describe IIRC), or use multiple tar commands, each operating on a
separate chunk of the directory tree he wants to tar up. Remember that
tar was designed to work on tape drives which lacked convenient ways to
seek, and the limitations tar has make more sense.

Corruption or loss of one file (or tape) in a backup set has *always*
been a problem. You can make backups of your backups, but that gets
ridiculous.

> I would try to use the -L option and write a wrapper surround that
> renames the created file and makes tar continue. (never did it).

Might work, but it'd be ugly and too complex. Backups should be as
simple as possible.

Alex Yung

unread,
Sep 9, 2003, 4:32:12 PM9/9/03
to
If the OP still reads this thread, this is how to do it. By using the
option -L and -F, he can accomplish it. I am not about to write a
script for him. I can get him started.

tar -cv -L 1048576 -F /tmp/renameArchive -f /tmp/test.tar [filelist]

The script "/tmp/renameArchive" contains this:
mv /tmp/test.tar /tmp/`date +%Y%m%d%H%M%S`.tar

He can write his own script to generate sequential number. But he
should get the idea how to finish it.

peter pilsl (pilsl_...@goldfisch.at) wrote:

Ed Murphy

unread,
Sep 10, 2003, 12:48:17 AM9/10/03
to
On Tue, 09 Sep 2003 21:53:02 +0200, Peter T. Breuer wrote:

> fabio vassalli <fabio.v...@bluewin.ch> wrote:

>> if I [created] a single
>> tar-file, [it] would [be] about 4 GB. it would be [easier] to handle [if]
>> the archive [were] distributed to four 1 GB tar files.
>
>> It is not a solution to create first a big tar-file and to split it
>> therafter, because I want to access the content of each file directly.
>
> OK. Well, you'll have to solve the bin-packing problem then. Sorry, but
> it's equivalent to the travelling salesman problem. And that's
> NP-complete.

Why's that? From what I gather, he simply wants to create chunks that
are *roughly* 1 GB apiece, without any files being split between chunks.

> I'm afraid you are mistaken. You have *direct* access to each file even
> when the archive is split into 4 arbitrarily. You do
>
> cat foo.tar.[1-4] | tar xvf - my/file/name

Which requires having all four chunks. It sounds like 'tar -M -F
<filename-generating-script>' would produce four standalone chunks,
which is closer to what I think he was asking for. (You'd still need
to know which chunk contained a given file.)

fabio vassalli

unread,
Sep 10, 2003, 1:38:48 AM9/10/03
to
Alex Yung wrote:
> If the OP still reads this thread,

of course he reads the thread. thank you, I think this could be the good
way.

fabio vassalli

fabio vassalli

unread,
Sep 10, 2003, 1:54:13 AM9/10/03
to
thank you for this first-class explications on the -M option. Now
I have understood how a script can replace the manual input requested.

thanks

fabio vassalli

fabio vassalli

unread,
Sep 10, 2003, 1:59:19 AM9/10/03
to

> That's because you are doing it wrong. You want
>
> tar xvf - < foo.tar
>
oh, I see... In fact, this way it and I dont need to creat smaller pieces...

Peter T. Breuer

unread,
Sep 10, 2003, 2:28:58 AM9/10/03
to
Ed Murphy <emur...@socal.rr.com> wrote:
> On Tue, 09 Sep 2003 21:53:02 +0200, Peter T. Breuer wrote:

> > fabio vassalli <fabio.v...@bluewin.ch> wrote:

> >> if I [created] a single
> >> tar-file, [it] would [be] about 4 GB. it would be [easier] to handle [if]
> >> the archive [were] distributed to four 1 GB tar files.
> >
> >> It is not a solution to create first a big tar-file and to split it
> >> therafter, because I want to access the content of each file directly.
> >
> > OK. Well, you'll have to solve the bin-packing problem then. Sorry, but
> > it's equivalent to the travelling salesman problem. And that's
> > NP-complete.

> Why's that?

Because he needs to solve the problem of putting n files which total
4GB into 4 bins each of which are 1GB in size.

This is the "bin packing", or "knapsack", or "suitcase" problem.

> From what I gather, he simply wants to create chunks that
> are *roughly* 1 GB apiece, without any files being split between chunks.

And what is YOUR definition of "roughly"? Suppose there is one file of
3.9GB in size?


> > I'm afraid you are mistaken. You have *direct* access to each file even
> > when the archive is split into 4 arbitrarily. You do
> >
> > cat foo.tar.[1-4] | tar xvf - my/file/name

> Which requires having all four chunks. It sounds like 'tar -M -F
> <filename-generating-script>' would produce four standalone chunks,
> which is closer to what I think he was asking for. (You'd still need
> to know which chunk contained a given file.)

Which you could not do without having access to all 4 chunks.

I repeat, he is simply asking for a solution to the bin packing problem.
He wants to give a program a list of the n files, with sizes, and get
back 4 lists, each totalling 1GB ("roughly"). Then he makes 4 tars with
each tar archive containing one of the lists of files.

I.e. - "it's nothing to do with tar".

Peter

Ed Murphy

unread,
Sep 10, 2003, 2:57:08 AM9/10/03
to
On Wed, 10 Sep 2003 08:28:58 +0200, Peter T. Breuer wrote:

>> > OK. Well, you'll have to solve the bin-packing problem then. Sorry, but
>> > it's equivalent to the travelling salesman problem. And that's
>> > NP-complete.
>
>> Why's that?
>
> Because he needs to solve the problem of putting n files which total
> 4GB into 4 bins each of which are 1GB in size.
>
> This is the "bin packing", or "knapsack", or "suitcase" problem.

Yeah, I know about NP-complete problems.

>> From what I gather, he simply wants to create chunks that
>> are *roughly* 1 GB apiece, without any files being split between chunks.
>
> And what is YOUR definition of "roughly"? Suppose there is one file of
> 3.9GB in size?

Then he should know better than to bother with 1GB knapsacks. :)

Andy Baxter

unread,
Sep 10, 2003, 8:29:36 AM9/10/03
to
Peter T. Breuer wrote:

> OK. Well, you'll have to solve the bin-packing problem then. Sorry, but
> it's equivalent to the travelling salesman problem. And that's
> NP-complete.

Is NP-completeness where any algorithm for an optimal solution grows very
fast (according to some definition I can't remember) compared to the number
of entities involved?
Given that the OP probably isn't looking for an optimal solution, this might
not be such a problem.

andy.

--

remove ' n - u - l - l ' to email me.
Please don't send me html mail or un-notified attachments. These will be
automatically filed under 'probable spam' unless I'm expecting an email
which hasn't come.
If you do need to send an attachment or html mail, put [attachment] or
[html] in the subject line.
Thanks, andy.

Peter T. Breuer

unread,
Sep 10, 2003, 8:40:10 AM9/10/03
to
Andy Baxter <ne...@earthsong.null.free-online.co.uk> wrote:
> Peter T. Breuer wrote:

> > OK. Well, you'll have to solve the bin-packing problem then. Sorry, but
> > it's equivalent to the travelling salesman problem. And that's
> > NP-complete.

> Is NP-completeness where any algorithm for an optimal solution grows very
> fast (according to some definition I can't remember) compared to the number
> of entities involved?

That's right (one supposes :-).

> Given that the OP probably isn't looking for an optimal solution, this might
> not be such a problem.

Indeed, there are suboptimal heuristics available. The only problem is
implementing them. The point is that selecting which file goes in
which tar archive is essentially a separate process. On its own,
tar can only stop building an archive when the next file would make it
go beyond some set limit in size. Thus it can not predict how many
archives it will need.

For example:

0.99G
0.011G
0.99G
0.011G
0.99G
0.978G

requires 6 archives <= 1GB in size using the first-come-first-served
heuristic. But it can be packed in 4, and the total size is < 4GB.

0.99G

0.99G

0.99G

0.978G + 0.011G + 0.011G


Peter

anon...@remailer.hastio.org

unread,
Sep 10, 2003, 12:18:11 PM9/10/03
to
"fv" == fabio vassalli <fabio.v...@bluewin.ch>:
fv> It is not a solution to create first a big tar-file and to split it
fv> therafter, because I want to access the content of each file directly.
fv> (In fact, my tar-command refuses to unpack (but not to pack) tar-files
fv> larger than 2 GB, even so I'm using reiserfs).

Try using 'star' instead of 'tar'.

Ed Murphy

unread,
Sep 10, 2003, 12:36:55 PM9/10/03
to
On Wed, 10 Sep 2003 13:29:36 +0100, Andy Baxter wrote:

> Peter T. Breuer wrote:

>> OK. Well, you'll have to solve the bin-packing problem then. Sorry, but
>> it's equivalent to the travelling salesman problem. And that's
>> NP-complete.

> Is NP-completeness where any algorithm for an optimal solution grows very
> fast (according to some definition I can't remember) compared to the number
> of entities involved?

Kinda sorta.

NP is the set of problems that can be solved by a deterministic algorithm
that takes polynomial time, compared to the number of inputs.

P is the set of problems for which a potential solution can be checked by
a deterministic algorithm that takes polynomial time.

NPC is the set of problems in NP (X) such that any other problem in NP (Y)
can be reduced to X (i.e. an algorithm to solve X leads to an algorithm to
solve Y) in polynomial time.

Any problem in NPC (and thus any problem in NP) can be solved by a
deterministic algorithm that runs in *exponential* time: simply
generate and test every possible solution. However, compared to *any*
polynomial time (no matter how awful), exponential time - given
sufficiently many inputs - is even worse.

One of the Big Questions in mathematics is: "Does any problem in
NPC (and thus every problem in NP) belong to P?" Most mathematicians
believe the answer is no.

This will give you lots more detail:

http://www.wikipedia.org/wiki/Complexity_classes_P_and_NP

Dave Carrigan

unread,
Sep 10, 2003, 12:37:58 PM9/10/03
to
fabio vassalli <fabio.v...@bluewin.ch> writes:

> i'd like to use tar to crate an archive. if I would crate a single
> tar-file, id would have about 4 GB. it would be easyer to handle it when
> the archive would be distributed to four 1 GB tar files.
>
> It is not a solution to create first a big tar-file and to split it
> therafter, because I want to access the content of each file directly.
> (In fact, my tar-command refuses to unpack (but not to pack) tar-files
> larger than 2 GB, even so I'm using reiserfs).
>
> I have tried the tar -L option, but it needs a manual action after each
> file, and each output-file gets the same name, and I would like to use
> the command in a script (for cron.daily).
>
> There is also an tar -M option, but I dont see what to do with.

You need to use the -L, -M and -F options.

-L says what the length of the "tape" is. For 1GB files, you would use
1048576.

-M says that it's a multi-volume archive.

-F is the script to run between each volume. With the -F option, tar
will not prompt you to change tapes; instead it runs a script. You will
have to write the script.

The --volno-file is also usful.

Typical usage is

#! /bin/sh
rm -f /tmp/volno
tar cf /path/to/file.tar.tmp -L 1048576 -M \
-F /path/to/new-volume-script \
--volno-file=/tmp/volno

The new-volume-script would look like

------------------------------------------------------------------------
#! /bin/bash

newvol=`cat /tmp/volno`
oldvol=$(printf "%03d" $[ $newvol - 1 ] )
mv /path/to/file.tar.tmp /path/to/tar.$oldvol

------------------------------------------------------------------------

At the end, you will be left with a bunch of tar.n files and one
file.tar.tmp file, which you would want to rename to tar.n+1. You will
be able to untar each one individually.

If you want to untar them all, you have to use the -M option again,
otherwise files that were split across a volume will not get extracted
properly. Thus, you would do:

ln -s /path/to/tar.001 /tmp/tar.work
tar xvf /tmp/tar.work -M

Each time it prompts you to change "tapes", change the /tmp/tar.work
symlink to point to the next volume. Naturally, you could script that
with the -F option.

--
Dave Carrigan
Seattle, WA, USA
da...@rudedog.org | http://www.rudedog.org/ | ICQ:161669680
UNIX-Apache-Perl-Linux-Firewalls-LDAP-C-C++-DNS-PalmOS-PostgreSQL-MySQL

Ed Murphy

unread,
Sep 10, 2003, 6:27:46 PM9/10/03
to

> ~~~~~~~~~~~~~~~~~~~~~
> This message was posted via one or more anonymous remailing services.
> The original sender is unknown. Any address shown in the From header
> is unverified.

Now doesn't *that* inspire a lot of confidence?

FYI: Several weeks ago, comp.os.linux.misc / comp.unix.solaris saw a
rather shrill argument between Joerg Schilling (the author of star) and
Peter Breuer, regarding (a) whether star or GNU tar was better and (b)
whether Joerg was morally obligated to identify himself as the author
of star when recommending it. Which he stubbornly refused to do, not
because there was any question (there wasn't) but because he didn't
want to give Peter the satisfaction.

I wonder whether that's Joerg behind the anonymizer. (I am commenting
on (b) alone here; giving an informed opinion on (a) requires deeper
understanding than mine.)

Nomen Nescio

unread,
Sep 11, 2003, 6:30:06 AM9/11/03
to
"EM" == Ed Murphy <emur...@socal.rr.com>:
EM> Now doesn't *that* inspire a lot of confidence?
Would putting a fake name and an invalid email address in the From:
header inspire more confidence? I doubt it.

EM> I wonder whether that's Joerg behind the anonymizer.
No, it's not. Joerg is probably busy writing _great free_ software,
instead of playing silly little games with other Usenet users and
anonymizers.

0 new messages