Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CPIO VS TAR for Backup?

25 views
Skip to first unread message

Tom

unread,
Jun 23, 2004, 3:09:37 PM6/23/04
to
Hello,

I like to schedule a daily backup of my RH 8.0 machine to a tape and I'm
considering either cpio or tar for that. Please let me know difference
between them or pros and cons.

Any help is greatly appreciated,
Tom


Tim

unread,
Jun 23, 2004, 4:28:20 PM6/23/04
to
"Tom" <t...@pleasenospam.com> wrote in

> I like to schedule a daily backup of my RH 8.0 machine to a tape and I'm
> considering either cpio or tar for that. Please let me know difference
> between them or pros and cons.

cpio is old and not so much used I think
tar is good and very standardized.

I found rdiff-backup today that I think seems very good and that is what I
will use in my new instalation (one machine and backup to a usb drive)

Tim


John Thompson

unread,
Jun 24, 2004, 9:05:00 AM6/24/04
to

If you're using ext2 or ext3, dump/restore might be a better choice than
either cpio or tar.

--

-John (jo...@os2.dhs.org)

Heiner Steven

unread,
Jun 24, 2004, 10:47:54 AM6/24/04
to
Tim wrote:

> "Tom" <t...@pleasenospam.com> wrote in
>
>>I like to schedule a daily backup of my RH 8.0 machine to a tape and I'm
>>considering either cpio or tar for that. Please let me know difference
>>between them or pros and cons.
>
> cpio is old and not so much used I think
> tar is good and very standardized.

[...]

"tar" is a "tape archiver" invented to save files on a
tape device. This causes some unwanted behaviour in the
case where the destination is *not* a tape device. Example:

$ > x
$ ls -l x
-rw-r--r-- 1 heiner users 0 2004-06-24 16:42 x

The file has the size zero. Now let's see how "tar" and "cpio"
archive them:

$ tar cf x.tar x # create x.tar
$ echo x | cpio -oc > x.cpio # create x.cpio
$ ls -l x*
-rw-r--r-- 1 heiner users 0 2004-06-24 16:44 x
-rw-r--r-- 1 heiner users 512 2004-06-24 16:44 x.cpio
-rw-r--r-- 1 heiner users 10240 2004-06-24 16:44 x.tar

We see that archiving an empty file creates a 10k tar archive,
but only a 0.5k cpio archive.

I think the main reason of many people preferring "tar" is that
it automatically recurses into directories, whereas "cpio" requires
the additional knowledge of "find":

$ tar cf home.tar "$HOME"

compared to

$ find "$HOME" -print | cpio -oc > home.cpio

Heiner
--
___ _
/ __| |_ _____ _____ _ _ Heiner STEVEN <heiner...@nexgo.de>
\__ \ _/ -_) V / -_) ' \ Shell Script Programmers: visit
|___/\__\___|\_/\___|_||_| http://www.shelldorado.com/

Christopher Browne

unread,
Jun 24, 2004, 10:52:46 AM6/24/04
to

.. And if you might conceivably have to recover to some other
filesystem, or might want to pick out just _some_ files, using a
FS-specific tool would be about the worst possible choice.
--
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://www.ntlug.org/~cbbrowne/backup.html
Signs of a Klingon Programmer - 15. "Python? That is for children. A
Klingon Warrior uses only machine code, keyed in on the front panel
switches in raw binary."

John-Paul Stewart

unread,
Jun 24, 2004, 10:52:05 AM6/24/04
to
Tim wrote:
> "Tom" <t...@pleasenospam.com> wrote in
>
>>I like to schedule a daily backup of my RH 8.0 machine to a tape and I'm
>>considering either cpio or tar for that. Please let me know difference
>>between them or pros and cons.
>
>
> cpio is old and not so much used I think
> tar is good and very standardized.

Tar is older than cpio. In fact, cpio was originally created to replace
tar. However, tar does seem to be more widely used.

MLL

unread,
Jun 24, 2004, 1:01:59 PM6/24/04
to
John Thompson wrote:


> If you're using ext2 or ext3, dump/restore might be a better choice than
> either cpio or tar.

I do not exacly remember why but I think dump does have some issues and
it use is not recommended anymore

John Thompson

unread,
Jun 24, 2004, 12:59:28 PM6/24/04
to
On 2004-06-24, Christopher Browne <cbbr...@acm.org> wrote:

> Martha Stewart called it a Good Thing when John Thompson <jo...@starfleet.os2.dhs.org> wrote:
>>
>> If you're using ext2 or ext3, dump/restore might be a better choice than
>> either cpio or tar.

> .. And if you might conceivably have to recover to some other
> filesystem, or might want to pick out just _some_ files, using a
> FS-specific tool would be about the worst possible choice.

I've found that I can use linux restore to extract files from a dump
created by FreeBSD from a UFS filesystem. If you have xfs filesystems,
"xfsdump/xfsrestore" do the same job.

And restore has an interactive mode that lets you pick and choose what to
restore in a manner analogous to the filesystem hierarchy; e.g. "ls" to
see what files are in the dump, then "add whatever" to add to the list to
be extracted, and "extract" to pull them out of the dump and restore to
the filesystem.

And dump supports incremental backups very well.

--

-John (jo...@os2.dhs.org)

Frank Miles

unread,
Jun 24, 2004, 2:41:35 PM6/24/04
to
In article <3rpebc...@mail.binaryfoundry.ca>,

IIRC One problem with tar is that if you use compression, and your media
develop a single-bit glitch, you can use the entire archive. Afio does
file-by-file compression, so is a safer method: a single-bit-glitch should
only result in the loss of a single file.

HTH...
-frank
--

Jean-David Beyer

unread,
Jun 24, 2004, 3:15:18 PM6/24/04
to

Not only is tar older than cpio, it was originally a band-aid imposed on
the ar program to make it more suitable for writing tapes. tape-ar.

Of course, the GNU folks have probably removed the accumulated crud that
infests programs that have been modified as much as the Bell Labs tar
program.

Actually, using find and cpio is in the UNIX tradition of using programs
to do simple things and do them well. So find finds files, and cpio
writes, reads, and copies them. Pipe them together in the UNIX tradition
to do the more complex job of writing backup files. That way, find does
not need to understand tape drives, and cpio does not need the complexity
of finding all manner of files.

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 15:10:00 up 2 days, 4:41, 5 users, load average: 2.10, 2.08, 2.08

Jean-David Beyer

unread,
Jun 24, 2004, 3:17:28 PM6/24/04
to
Frank Miles wrote:

>
> IIRC One problem with tar is that if you use compression, and your
> media develop a single-bit glitch, you can use the entire archive.
> Afio does file-by-file compression, so is a safer method: a
> single-bit-glitch should only result in the loss of a single file.
>

Sure. But do not most modern tape drives use hardware compression? All the
ones I have used, even dumb QIC tape drives that I used in 1996 have that
feature. And that surely is done on a file-at-a-time basis or even at a
block at a time basis. So who cares what software compression methods are
used unless you must read an old archive tape?

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org

^^-^^ 15:15:00 up 2 days, 4:46, 5 users, load average: 2.18, 2.13, 2.09

Juhan Leemet

unread,
Jun 24, 2004, 6:32:09 PM6/24/04
to

I believe the issues with dump (on Linux, and ufsdump on Solaris, and
others) is related to backup of mounted file systems. The dump programs go
"underneath" the file system and access the raw device. For the dump to
work best/correctly that raw device must be unmounted. OK, so how do you
dump your root /? Purists might boot from net/CD/etc. A lot of people just
"take their chances" and try/hope it is quiescent. The problem is that if
changes occur, they can affect the directory structure, and not only
invalidate single file(s) but large parts of the dump'ed file system! I
didn't know that before, either. Solaris solution is to provide additional
tools like fssnap or lockfs which create (temporary) static copies of the
disk structure, so you can backup. I don't know if Linux has anything like
that. I suspect not, since Linus does not like dump, and has said so.
Because this is a kernel related facility (?) without Linus it's unlikely.

So, unfortunately, even though dump is fastest, it is tricky to use.

Furthermore, I found (to my chagrin) that dump won't do reiserfs.

I now do tar. I think cpio has its own problems, and some advise against.

Dunno what the story is with pax? Good idea that was never accepted?

--
Juhan Leemet
Logicognosis, Inc.

Robert Nichols

unread,
Jun 24, 2004, 9:59:00 PM6/24/04
to
In article <40dae98e$0$26358$9b4e...@newsread4.arcor-online.net>,
Heiner Steven <heiner...@nexgo.de> wrote:
:
:"tar" is a "tape archiver" invented to save files on a

:tape device. This causes some unwanted behaviour in the
:case where the destination is *not* a tape device. Example:
:
: $ > x
: $ ls -l x
: -rw-r--r-- 1 heiner users 0 2004-06-24 16:42 x
:
:The file has the size zero. Now let's see how "tar" and "cpio"
:archive them:
:
: $ tar cf x.tar x # create x.tar
: $ echo x | cpio -oc > x.cpio # create x.cpio
: $ ls -l x*
: -rw-r--r-- 1 heiner users 0 2004-06-24 16:44 x
: -rw-r--r-- 1 heiner users 512 2004-06-24 16:44 x.cpio
: -rw-r--r-- 1 heiner users 10240 2004-06-24 16:44 x.tar
:
:We see that archiving an empty file creates a 10k tar archive,
:but only a 0.5k cpio archive.

That is significant only if your empty file is the only thing in the tar
archive. The entry for file "x" actually uses only 512 bytes of the
archive. You could fit 18 empty files into that same 10KB tar archive.
And, that 10KB blocksize is merely tar's default. The actual blocksize
can be any multiple of 512 bytes, limited only by available memory and,
if you are actually writing to a physical device like a tape drive, the
kernel's buffer size.

--
Bob Nichols AT comcast.net I am "rnichols42"

Skylar Thompson

unread,
Jun 28, 2004, 1:59:15 AM6/28/04
to
On Thu, 24 Jun 2004 20:32:09 -0200, Juhan Leemet <ju...@logicognosis.com> wrote:
>
> I believe the issues with dump (on Linux, and ufsdump on Solaris, and
> others) is related to backup of mounted file systems. The dump programs go
> "underneath" the file system and access the raw device. For the dump to
> work best/correctly that raw device must be unmounted. OK, so how do you
> dump your root /? Purists might boot from net/CD/etc. A lot of people just
> "take their chances" and try/hope it is quiescent. The problem is that if
> changes occur, they can affect the directory structure, and not only
> invalidate single file(s) but large parts of the dump'ed file system! I
> didn't know that before, either. Solaris solution is to provide additional
> tools like fssnap or lockfs which create (temporary) static copies of the
> disk structure, so you can backup. I don't know if Linux has anything like
> that. I suspect not, since Linus does not like dump, and has said so.
> Because this is a kernel related facility (?) without Linus it's unlikely.

FreeBSD 5-RELEASE has an "L" option to its dump that works on UFS2
filesystems in a similar way that Solaris dump works on Sun UFS
filesystems.

--
-- Skylar Thompson (sky...@cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/

Juhan Leemet

unread,
Jun 28, 2004, 5:38:32 PM6/28/04
to
On Mon, 28 Jun 2004 05:59:15 +0000, Skylar Thompson wrote:
> On Thu, 24 Jun 2004 20:32:09 -0200, Juhan Leemet <ju...@logicognosis.com> wrote:
>> I believe the issues with dump (on Linux, and ufsdump on Solaris, and
>> others) is related to backup of mounted file systems. The dump programs go
>> "underneath" the file system and access the raw device...
>> ...additional tools like fssnap or lockfs which create (temporary)

>> static copies of the disk structure, so you can backup. I don't know
>> if Linux has anything like that...

>
> FreeBSD 5-RELEASE has an "L" option to its dump that works on UFS2
> filesystems in a similar way that Solaris dump works on Sun UFS
> filesystems.

Sounds interesting. It must have kernel support for this? How did they get
that past Linus (teasing here, I don't know the man)? Hmm, ufs2, huh. I
don't suppose it would be feasible to make that work with ext2,3,reiserfs?
I'll have to do some reading on that, just for interest.

I'm used to using *dump, and like the idea of doing a quick volume backup.
I guess with all other file systems except for / (root) you could (in
principle) unmount them during the backup. Might be some carnage on the
side, when programs (and NFS servers? Samba servers?) try to access stuff.
So, you would have to shut them down, and remember to restart them. Yuck!

chris-...@roaima.co.uk

unread,
Jun 30, 2004, 4:56:19 AM6/30/04
to
Juhan Leemet <ju...@logicognosis.com> wrote:
>> FreeBSD 5-RELEASE has an "L" option to its dump that works on UFS2
>> filesystems in a similar way that Solaris dump works on Sun UFS
>> filesystems.

> Sounds interesting. It must have kernel support for this? How did they get
> that past Linus (teasing here, I don't know the man)? Hmm, ufs2, huh. I
> don't suppose it would be feasible to make that work with ext2,3,reiserfs?
> I'll have to do some reading on that, just for interest.

Conceptually it doesn't seem /that/ hard to me... you could provide a
process based write lock on the filesystem that blocks the page flush
thread. All writes would then be cached until the lock was released. At
that point the page flush thread could write all the pending pages
to disk.

Chris (not a kernel hacker)

Juhan Leemet

unread,
Jul 1, 2004, 12:12:46 AM7/1/04
to

ISTR reading that the problem was actually the inverse of that. With
blocks that had been created in the disk block buffers, and were slowly
but "erratically" (i.e. not in any particular sequence) getting written to
disk. Since *dump does not go through the file system, it does not "see"
the blocks in the disk block buffer, and would not find them on the disk,
when it reads the "raw disk". So, you might have file descriptors
(directories) already on the disk, but their content inodes have not been
flushed out yet, or worse: subdirectories on disk, with parent directories
not yet flushed out, i.e. "disconnected" file systems. I think the latter
is the most serious (disastrous?) case. Any descriptions I read said that
not only could you lose parts of the occasional file, but (large?) parts
of your file system. In that case, what good is the backup.

Conceptually, it would seem that you'd have to do some kind of dance,
like: freeze disk I/O, flush all of the disk block buffers, lock the cache
(like you suggest) and prevent any flushing until backup is complete. Then
what do you do if/when the cache gets full? This is a nasty problem.

BTW, I believe some of the solutions do something like: checkpoint all
inodes into a special (temporary) file? Then somehow prevent deletions? I
don't know the details, even after reading some man pages on commands.

(not a kernel hacker/expert either)

Interesting (and vitally important?) topic, though.

chris-...@roaima.co.uk

unread,
Jul 1, 2004, 6:38:41 AM7/1/04
to
Juhan Leemet <ju...@logicognosis.com> wrote:
> Any descriptions I read said that
> not only could you lose parts of the occasional file, but (large?) parts
> of your file system.

Yes. Good points :-)

> Conceptually, it would seem that you'd have to do some kind of dance,
> like: freeze disk I/O, flush all of the disk block buffers, lock the cache
> (like you suggest) and prevent any flushing until backup is complete.

Even that's tricky because you'd (maybe) struggle to "know" which blocks
were part of an outstanding operation and which were the initial sections
of a new one.

> Then
> what do you do if/when the cache gets full? This is a nasty problem.

I suggest that you would lock out the process(es) demanding more pages
until the dump lock was released. The tradeoff here would be that as
the time taken for the dump increased, you need more memory to handle
the evergrowing outstanding cache. IMO if you really needed to dump a
live filesystem then this would be an acceptable caveat.

OTOH if you really need to dump a live filesystem, you're probably better
off using something like LVM to take a logical snapshot and backup that.

Chris

0 new messages