I have digged a bit and found that the new implementation (but I do not
know which one) uses "extended format" for both the /mtime/ and /size/
fields in the header. In all the standards applicable to tar, the
numbers are stored as octal ASCII; since those fields are 11+1 bytes in
length, this constraints file sizes to be less than 8 GB, and timestamps
to be before 2242 (!) Some people apparently worry about that, so an
extension was designed which allows those quantities to be stored as
binary and hence allow sizes up to 8 EB; note this loose properties like
headers being ASCII only and is in direct contradiction with the POSIX
standard, which designed purposely a different mechanism to that effect
(preserving the ASCII format, being retro-compatible, much more
extensible etc.)
However the real point is that all those extension stuff is completely
unnecessary for our purposes, and should probably be deactivated by
default and only enabled when really necessary: the tar format is used
for interchange purpose, and using ad-hoc extensions will always reduce
portability; it is of course worse when it is not even necessary.
I know people will told me that packages are supposed to be installed
only with the tools like pkg_add which are supposed to be up-to-date
enough to deal with such issues, but this is completely missing the
point to use standard tools in the first place.
Antoine
I know everything which is way younger than me can be labelled as
'recent' ;-), yet pax appeared formally in 1987-88, with the first
published POSIX standard.
(pax is a compromise, typical of standard committees, to avoid naming
'tar' _and_ to allow using the concurrent cpio format. Read about any
Latin dictionary for the insider joke. The underlying pax format is
nothing but slight extensions to V7 tar, which roots back to 1978. Still
'recent' according to the above, but less so. ;-) )
Enough kidding though. In 2001 (X/Open 6 "merge" with Posix) pax was
extended, and the 'g'/'x' entries were standardized, which create the
framework for many future extensions. As a result, new developments
ought not to derogate to this scheme to introduce new functionalities;
OTOH the committee is not God either.
> Here's what I found from some research:
>
> "In 1997, a method for adding unlimited extensions to the tar format
> has been proposed by Sun and later accepted for the POSIX.1-2001
> standard. This format is known as extended tar-format or pax-format."
> - http://en.wikipedia.org/wiki/Tar_%28file_format%29#Problems_and_limitations
The problem as I view it, is that the present .tgz used for Minix do NOT
use those extensions, but rather from another set (base 256 coding,
a.k.a numeric extension) designed by Joerg Schilling around 1985 (?) and
later introduced into GNU tar in 1999 (1.13.12 according to change
logs), among many other non-portable extensions; since now GNU tar had
became the reference for Linux and others, but not all. And since the
number of those extensions added to GNU tar is important, not all of
them are replicated in every tar implementation (particularly when it
does not make much sense, such as dates after 2242 a.C. IMHO of course!)
Furthermore, while GNU tar says it is using those extensions only when
necessary (ie, more than 8GB files), the current implementation for
Minix packages use them for every file, resulting in .tar files which
can be read with GNU tar, Schilling's star, libarchive-based bsdtar and
some others, but NOT with barebones standard-complying implementations.
Antoine
It creates a framework to add extensions in ways which should be
compatible with the previous releases, those that do not know anything
about the extensions. The so-called backward compatibility.
XPG6 also defines some extensions which use the new framework, but the
real gain is that one can add a ad-hoc extension, say, to indicate a
size to reserve stored as a meta-data in the file system, using the
classical format and still being interoperable with others systems which
know nothing about that metadata.
> 1. Continue with GNU tar
>
> This appears to be problematic for legacy compatibility, but does
> solve the problem of the size limit.
My point was that there are NO problem of size limit.
Furthermore, the way MINIX pkgsrc is working now, it is not using GNU
tar (rather the embedded tar from libarchive), and do not behave the way
GNU tar behaves, i.e. it requires the use of the non-compatible format
even when not necessary; OTOH GNU tar only switches to the new format
when it cannot do otherwise (for present MINIX, it would mean never,
since >4GB file cannot be reported to application programs).
> 4. Track standards and use pax
>
> If it is true that the later standards prefer pax, it seems to solve
> the problem of size limitations, but I'm not sure of legacy
> compatibility. Additionally, it may introduce some problems for
> interoperability.
pax format is not really different from tar, more exactly from the
'ustar' variant of the tar file format. The real difference is that you
ought to use the pax command instead of the tar command, which are
different in syntax (pax is more alike cpio in spirit): such a change is
not really significant, since very few people inspect the commands
passed while making packages or installing them ;-) Also do not forget
that the pax binary, when invoked as tar, understands the traditional
options like "tvf". This is what MINIX actually had for some times!
MINIX changed in July to use bsdtar, probably because of mere
convenience to have the zip/bzip2/etc. compression modules embedded into
the command, so you can now use commands like
tar cjf save.tar.bz2
tar xf package.tar.gz
without using pipes; this clearly does not conform to the Standards, but
it is so cute ;-)
Antoine