.tar format used for packages

8 views
Skip to first unread message

Antoine LECA

unread,
Mar 4, 2011, 11:33:24 AM3/4/11
to min...@googlegroups.com
I have troubles with the tar format now used for packages, since about
January 26th (when libarchive-2.8.4 was put online?)
Two independent tools are unable to read those files: they understand
files inside the tar as being 0 sized. Which renders it unreadable.

I have digged a bit and found that the new implementation (but I do not
know which one) uses "extended format" for both the /mtime/ and /size/
fields in the header. In all the standards applicable to tar, the
numbers are stored as octal ASCII; since those fields are 11+1 bytes in
length, this constraints file sizes to be less than 8 GB, and timestamps
to be before 2242 (!) Some people apparently worry about that, so an
extension was designed which allows those quantities to be stored as
binary and hence allow sizes up to 8 EB; note this loose properties like
headers being ASCII only and is in direct contradiction with the POSIX
standard, which designed purposely a different mechanism to that effect
(preserving the ASCII format, being retro-compatible, much more
extensible etc.)

However the real point is that all those extension stuff is completely
unnecessary for our purposes, and should probably be deactivated by
default and only enabled when really necessary: the tar format is used
for interchange purpose, and using ad-hoc extensions will always reduce
portability; it is of course worse when it is not even necessary.


I know people will told me that packages are supposed to be installed
only with the tools like pkg_add which are supposed to be up-to-date
enough to deal with such issues, but this is completely missing the
point to use standard tools in the first place.


Antoine

pikpik

unread,
Mar 12, 2011, 4:58:07 PM3/12/11
to minix3
Hi,
It seems that "pax" is a recent POSIX format used by IEEE Std 1003.1,
2004 Edition. Perhaps a decision should be made to track current
standards or stay with a historic format?


Here's what I found from some research:

"In 1997, a method for adding unlimited extensions to the tar format
has been proposed by Sun and later accepted for the POSIX.1-2001
standard. This format is known as extended tar-format or pax-format."
- http://en.wikipedia.org/wiki/Tar_%28file_format%29#Problems_and_limitations


List of Tar formats supported by libarchive - http://linux.die.net/man/5/libarchive-formats


According to the Single UNIX ® Specification, Version 2 from 1997:

"Applications should migrate to the pax utility." -
http://www.opengroup.org/onlinepubs/7990989799/xcu/tar.html

Pax - http://www.opengroup.org/onlinepubs/7990989799/xcu/pax.html


According to IEEE Std 1003.1, 2004 Edition:

"The pax utility was new for the ISO POSIX-2:1993 standard. It
represents a peaceful compromise between advocates of the historical
tar and cpio utilities."

...and...

"The cpio and ustar formats can only support files up to 8589934592
bytes (8 * 2^30) in size." - http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html

I'm probably missing a lot about this topic.


I hope this helps,
pikpik

Antoine LECA

unread,
Mar 14, 2011, 7:37:21 AM3/14/11
to min...@googlegroups.com
pikpik wrote:
> It seems that "pax" is a recent POSIX format

I know everything which is way younger than me can be labelled as
'recent' ;-), yet pax appeared formally in 1987-88, with the first
published POSIX standard.

(pax is a compromise, typical of standard committees, to avoid naming
'tar' _and_ to allow using the concurrent cpio format. Read about any
Latin dictionary for the insider joke. The underlying pax format is
nothing but slight extensions to V7 tar, which roots back to 1978. Still
'recent' according to the above, but less so. ;-) )

Enough kidding though. In 2001 (X/Open 6 "merge" with Posix) pax was
extended, and the 'g'/'x' entries were standardized, which create the
framework for many future extensions. As a result, new developments
ought not to derogate to this scheme to introduce new functionalities;
OTOH the committee is not God either.


> Here's what I found from some research:
>
> "In 1997, a method for adding unlimited extensions to the tar format
> has been proposed by Sun and later accepted for the POSIX.1-2001
> standard. This format is known as extended tar-format or pax-format."
> - http://en.wikipedia.org/wiki/Tar_%28file_format%29#Problems_and_limitations

The problem as I view it, is that the present .tgz used for Minix do NOT
use those extensions, but rather from another set (base 256 coding,
a.k.a numeric extension) designed by Joerg Schilling around 1985 (?) and
later introduced into GNU tar in 1999 (1.13.12 according to change
logs), among many other non-portable extensions; since now GNU tar had
became the reference for Linux and others, but not all. And since the
number of those extensions added to GNU tar is important, not all of
them are replicated in every tar implementation (particularly when it
does not make much sense, such as dates after 2242 a.C. IMHO of course!)

Furthermore, while GNU tar says it is using those extensions only when
necessary (ie, more than 8GB files), the current implementation for
Minix packages use them for every file, resulting in .tar files which
can be read with GNU tar, Schilling's star, libarchive-based bsdtar and
some others, but NOT with barebones standard-complying implementations.


Antoine

pikpik

unread,
Mar 18, 2011, 8:26:07 PM3/18/11
to minix3
Hi,

On Mar 14, 7:37 am, Antoine LECA wrote:
> pikpik wrote:
> > It seems that "pax" is a recent POSIX format
>
> I know everything which is way younger than me can be labelled as
> 'recent' ;-), yet pax appeared formally in 1987-88, with the first
> published POSIX standard.

Yes, sorry. :) I just meant say that it seems to have been "pushed"
more recently than tar by the Open Group, or the committee members at
the time.

> (pax is a compromise, typical of standard committees, to avoid naming
> 'tar' _and_ to allow using the concurrent cpio format. Read about any
> Latin dictionary for the insider joke.

As in, peace to end the "war" over standards?

> The underlying pax format is nothing but slight extensions to V7 tar,
> which roots back to 1978. Still 'recent' according to the above,
> but less so. ;-) )

Ah, ok. I suppose in some ways this fact of pax coming from a version
of tar makes pax be seen as not much more to some people?

> Enough kidding though. In 2001 (X/Open 6 "merge" with Posix) pax was
> extended, and the 'g'/'x' entries were standardized, which create the
> framework for many future extensions. As a result, new developments
> ought not to derogate to this scheme to introduce new functionalities;
> OTOH the committee is not God either.

Ok. I think this makes sense, however I don't fully understand it. Is
it that this notable extension to pax allows future extensions of a
specific kind, or that it encouraged many changes which should not be
made?

> > Here's what I found from some research:
>
> > "In 1997, a method for adding unlimited extensions to the tar format
> > has been proposed by Sun and later accepted for the POSIX.1-2001
> > standard. This format is known as extended tar-format or pax-format."
> > -http://en.wikipedia.org/wiki/Tar_%28file_format%29#Problems_and_limit...
>
> The problem as I view it, is that the present .tgz used for Minix do NOT
> use those extensions, but rather from another set (base 256 coding,
> a.k.a numeric extension) designed by Joerg Schilling around 1985 (?) and
> later introduced into GNU tar in 1999 (1.13.12 according to change
> logs), among many other non-portable extensions; since now GNU tar had
> became the reference for Linux and others, but not all. And since the
> number of those extensions added to GNU tar is important, not all of
> them are replicated in every tar implementation (particularly when it
> does not make much sense, such as dates after 2242 a.C. IMHO of course!)

Perhaps MINIX could use a different kind of tar that is more aligned
with POSIX? I still see that the above excerpt from Wikipedia mentions
the extended tar format as being synonymous with the pax format. Could
pax be used by default instead of tar, or would that be nonsensical in
some way?

> Furthermore, while GNU tar says it is using those extensions only when
> necessary (ie, more than 8GB files), the current implementation for
> Minix packages use them for every file, resulting in .tar files which
> can be read with GNU tar, Schilling's star, libarchive-based bsdtar and
> some others, but NOT with barebones standard-complying implementations.
>
> Antoine

I agree about that being a problem. If one is to "track" the latest
standards, then it would seem that tar is deprecated (although used
far more widely than pax today, as far as I can see) and pax should be
used. It seems this is not the current practice however, and tar is
still the more familiar format. So, I'm really not sure what makes the
most sense.

The options I see are:

1. Continue with GNU tar

This appears to be problematic for legacy compatibility, but does
solve the problem of the size limit.

2. Use a legacy tar format

This allows legacy compatibility, but is not a solution to the size
limit.

3. Use NetBSD's userland tar, when available

I'm not certain this is very different from the problems of using GNU
tar.

4. Track standards and use pax

If it is true that the later standards prefer pax, it seems to solve
the problem of size limitations, but I'm not sure of legacy
compatibility. Additionally, it may introduce some problems for
interoperability.

Thank you,
pikpik

Antoine LECA

unread,
Mar 21, 2011, 5:05:42 AM3/21/11
to min...@googlegroups.com
pikpik wrote:
>> Enough kidding though. In 2001 (X/Open 6 "merge" with Posix) pax was
>> extended, and the 'g'/'x' entries were standardized, which create the
>> framework for many future extensions. As a result, new developments
>> ought not to derogate to this scheme to introduce new functionalities;
>> OTOH the committee is not God either.
>
> Ok. I think this makes sense, however I don't fully understand it. Is
> it that this notable extension to pax allows future extensions of a
> specific kind, or that it encouraged many changes which should not be
> made?

It creates a framework to add extensions in ways which should be
compatible with the previous releases, those that do not know anything
about the extensions. The so-called backward compatibility.
XPG6 also defines some extensions which use the new framework, but the
real gain is that one can add a ad-hoc extension, say, to indicate a
size to reserve stored as a meta-data in the file system, using the
classical format and still being interoperable with others systems which
know nothing about that metadata.


> 1. Continue with GNU tar
>
> This appears to be problematic for legacy compatibility, but does
> solve the problem of the size limit.

My point was that there are NO problem of size limit.

Furthermore, the way MINIX pkgsrc is working now, it is not using GNU
tar (rather the embedded tar from libarchive), and do not behave the way
GNU tar behaves, i.e. it requires the use of the non-compatible format
even when not necessary; OTOH GNU tar only switches to the new format
when it cannot do otherwise (for present MINIX, it would mean never,
since >4GB file cannot be reported to application programs).


> 4. Track standards and use pax
>
> If it is true that the later standards prefer pax, it seems to solve
> the problem of size limitations, but I'm not sure of legacy
> compatibility. Additionally, it may introduce some problems for
> interoperability.

pax format is not really different from tar, more exactly from the
'ustar' variant of the tar file format. The real difference is that you
ought to use the pax command instead of the tar command, which are
different in syntax (pax is more alike cpio in spirit): such a change is
not really significant, since very few people inspect the commands
passed while making packages or installing them ;-) Also do not forget
that the pax binary, when invoked as tar, understands the traditional
options like "tvf". This is what MINIX actually had for some times!

MINIX changed in July to use bsdtar, probably because of mere
convenience to have the zip/bzip2/etc. compression modules embedded into
the command, so you can now use commands like
tar cjf save.tar.bz2
tar xf package.tar.gz
without using pipes; this clearly does not conform to the Standards, but
it is so cute ;-)


Antoine

Reply all
Reply to author
Forward
0 new messages