tar sparse files

385 views
Skip to first unread message

David Thomas

unread,
Feb 4, 2014, 6:06:10 PM2/4/14
to golan...@googlegroups.com
I'm thinking about fixing issue 3864 ("archive/tar: cannot read GNU sparse files"), but I wasn't sure how we actually want to handle sparse files. I didn't want to go crazy implementing the wrong solution before bringing it up here. Do we want the reader to read the file in expanded form or in condensed form?

GNU sparse files stored in posix (pax) tar format currently get extracted as regular files in condensed form (as discussed in the GNU tar manual: http://www.gnu.org/software/tar/manual/html_section/Portability.html#Sparse-Recovery). Actually expanding them (which may not actually be desired, as it may use a lot of space) requires post-processing.

There are three different formats that GNU tar can use to store a sparse file in a posix format tar file (called sparse formats 0.0, 0.1, and 1.0). These formats are designed so that an implementation of tar that isn't aware of them extracts them in condensed form as regular files. The post-processing steps one has to follow to extract them depends on which of the three sparse formats gets used.

For a GNU format tar file (not a posix/pax one), a completely different format is used to store the sparse file in the tar archive that uses so-called "extension headers" following the main header containing the sparse map. This one causes the most serious problem, because if the tar reader is not aware of the format, it will likely cause it to fail with an error, unlike the posix tar sparse formats. This is what caused the specific error mentioned in issue 3864.

Do we want to handle all these formats by expanding the "holes" into zero bytes? Or do we simply want to make sure it doesn't crash when it sees the old GNU format?


Dave Cheney

unread,
Feb 4, 2014, 6:21:49 PM2/4/14
to David Thomas, golang-nuts
On Wed, Feb 5, 2014 at 10:06 AM, David Thomas <davidth...@gmail.com> wrote:
I'm thinking about fixing issue 3864 ("archive/tar: cannot read GNU sparse files"), but I wasn't sure how we actually want to handle sparse files. I didn't want to go crazy implementing the wrong solution before bringing it up here. Do we want the reader to read the file in expanded form or in condensed form?

Thanks for taking a crack at this issue
 

GNU sparse files stored in posix (pax) tar format currently get extracted as regular files in condensed form (as discussed in the GNU tar manual: http://www.gnu.org/software/tar/manual/html_section/Portability.html#Sparse-Recovery). Actually expanding them (which may not actually be desired, as it may use a lot of space) requires post-processing.

There are three different formats that GNU tar can use to store a sparse file in a posix format tar file (called sparse formats 0.0, 0.1, and 1.0). These formats are designed so that an implementation of tar that isn't aware of them extracts them in condensed form as regular files. The post-processing steps one has to follow to extract them depends on which of the three sparse formats gets used.

For a GNU format tar file (not a posix/pax one), a completely different format is used to store the sparse file in the tar archive that uses so-called "extension headers" following the main header containing the sparse map. This one causes the most serious problem, because if the tar reader is not aware of the format, it will likely cause it to fail with an error, unlike the posix tar sparse formats. This is what caused the specific error mentioned in issue 3864.

Do we want to handle all these formats by expanding the "holes" into zero bytes? Or do we simply want to make sure it doesn't crash when it sees the old GNU format?

I think the primary goal is to fix the crash.

archive/tar provides an io.Reader for the enclosed file, if that is stored inside the tar as a compressed form or not is not important for the consumer of the Reader. 

 


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Matt Harden

unread,
Feb 9, 2014, 5:00:07 PM2/9/14
to Dave Cheney, David Thomas, golang-nuts
IMO Read() should expand the holes. That's what happens in POSIX when reading a sparse file, and it follows the Principle of Least Surprise for people who may not know what a sparse file is (and for people who do).

Dave Cheney

unread,
Feb 9, 2014, 5:32:29 PM2/9/14
to Matt Harden, David Thomas, golang-nuts
I agree. 

David Thomas

unread,
Feb 9, 2014, 8:51:34 PM2/9/14
to golan...@googlegroups.com
Okay, that makes sense. I'll work on fixing this then.
Reply all
Reply to author
Forward
0 new messages