So... working on multivolume archives.

Skip to first unread message

Zaphod Beeblebrox

Oct 13, 2021, 12:49:48 PM10/13/21
to libarchive-discuss
I haven't dived in yet, but if I do, my intention is to work on multivolume archives.  I was a big fan of tape back in the DAT1 to DAT4 days.  Recently, the cost of a reasonable size (for me) LTO drive came within reach and the tapes are reasonable, too.  I started trying to poke at multivolume again.

So... gnutar (my old friend) has some support.  I run everything on FreeBSD, so it's immediate flaw is that I noticed it wasn't properly handling all of my filenames.  Sigh.

mbuffer wouldn't compile, so I played with another old friend: team.  It's flaw is that it doesn't close the file handle, so I can't change the tape.  Team is too clever by half.  After cleaning out the bat-crap crazy coding style that was in place to bridge K&R compatibility with C89, I discovered that in a balls-in-play fashion, each of the forked processes (team forks n processes, each with it's own little buffer) needed to read from it's copy of the forked filehandle --- I _could_ fix that, but I don't want to.

so... I've admired libarchive for awhile.  I jumped into my local copy of FreeBSD's sourcecode, found that it lived in "contrib" ... and here-I-am.

who-I-am?  Well... I run an ISP based solely on opensource.  I contribute to FreeBSD and now, hopefully, libarchive.

... so much for introdcutions?

Anyways... it strikes me that one thing you don't list in multivolume support is a modified tar format.  Let's call it m-tar for the moment.  The one item I'd like to add to the tar format is a block, that can be anywhere, but most specifically is written at the beginning of every volume, including the first, that contains the following text:

M-TAR <n> of <m> for <label>
record-size: <r>
record-number: <s>
next-file: <t>

Each line ends with <CR>.  <padding> consists of <CR> characters.  All numbers are written in ASCII.  <n> is the number of the current volume.  <m> is the number of volumes (if known) or "UNK" if not.  <label> is user defined text.  <r> is the record size in bytes, <s> is the record number of this record.  <t> is the record number of the next file header.

With this simple header, lots of tools, even dd, can produce a stream that a "regular" tar can read if the user can read the beginning of the file with less.  Also, with <t> the archive program can find the next file record (if, say, this volume is being read without other volumes having been read).  Again, also, using just this information and dd, someone could filter the stream such that a "regular" tar could read it.

(BTW... chose padding to just be CR's because less will read the file better and "head -10" will also easily read the file.)

Zaphod Beeblebrox

Oct 14, 2021, 12:31:29 AM10/14/21
to libarchive-discuss
Here's me replying to my own message.  I've read a lot of code today.  It seems to me that my proposed multi-volume archive is a hybrid between a filter (because it doesn't deal with archive_entry) and a write format type (because it wants to interact with things like bytes/blocks to go in the current file and on read it would ideally want to interact with how to get to the first block of the next new file, if starting from other than the first volume).

Thinking about the issue, unless someone convinces me otherwise, it seems prudent to start with a filter first and looking later add support, deliberately, for the multivolume variant of your favourite format ... seemingly pax-restricted.  Now I need to find out where the "bsdtar: Write error" at end of tape comes from so I can look at how to get that to the correct layer to do all this?

Zaphod Beeblebrox

Oct 21, 2021, 3:01:03 PM10/21/21
to libarchive-discuss
Following up my own post again, I have submitted patches to write multivolume tar archives thru github.  Anyone interested in testing is invited to have a look.

Reply all
Reply to author
0 new messages