metamath-program.zip archive getting ninja edited

134 views
Skip to first unread message

heiph...@wilsonb.com

unread,
Jan 31, 2020, 10:58:51 PM1/31/20
to meta...@googlegroups.com
This is part of a larger discussion happening at [1].

It appears that the metamath-program.zip URL[0] points to a non-static archive.
Sadly, this is preventing me from packaging metamath for my Linux distribution.

Just for the sake of clarity, below are the SHA-256 sums of the archive
retrieved on the corresponding dates:

- 2020 Jan 28 75fe8e83c4fde2104e24e38da8480b512df3dff40d46e5d40e038bd7a73557de
- 2020 Feb 01 084bb35600823c8c6db4375203d239c5d05df4488ce035fd05da8918dea70261

Unfortunately, the newer archive overwrote the old on my machine, so I cannot
compare contents. From a cursory inspection, however, it looks how I remember.
My suspicion is that, perhaps as part of a larger automated process, the zip
metadata is getting refreshed.

Is there a way we could offer static, versioned, archives of the standalone
executable source? One common option for this would be to let GitHub host them
as "releases" for the metamath/metamath-exe repository. Though I have not tried
myself, the creation process looks quite easy:

https://help.github.com/en/github/administering-a-repository/creating-releases

Does it seem reasonable to make this happen, some way or another?


[0]:http://us2.metamath.org/downloads/metamath-program.zip
[1]:https://groups.google.com/forum/#!msg/metamath/z2kKJYgnz-g/Z4mVlHVkAwAJ

David A. Wheeler

unread,
Feb 1, 2020, 8:06:01 AM2/1/20
to heiphohmia via Metamath
Just to clarify, you are not looking for a statically linked file, you are looking for a versioned file that once released never changes. Correct?

So you would like to see a filename like metamath-program-1.2.3.zip that once released never changes. Is there a particular file naming format you would prefer? It is up to Norm if he wants to support that, though I think that is a good idea.



--- David A.Wheeler

Norman Megill

unread,
Feb 1, 2020, 8:54:25 AM2/1/20
to Metamath
Nothing was changed inside the zip file between the dates you mention.

The zip file is recreated from source about once a day from the command:
zip -9 downloads/metamath-program.zip metamath/*.c metamath/*.h \
    metamath/metamath.exe  metamath/configure.ac metamath/Makefile.am \
    metamath/metamath.1

So the zip command apparently does not produce an identical compressed file even if the contents are the same.

Could .gz or .bz2 be used instead?  I think those are stable.

Norm


The problem may be that the zip program

vvs

unread,
Feb 1, 2020, 11:39:04 AM2/1/20
to Metamath
So the zip command apparently does not produce an identical compressed file even if the contents are the same.

They never meant to be reproducible. You should use something like torrentzip for such purpose.

Could .gz or .bz2 be used instead?  I think those are stable.

Again they never meant to be used like that, so it depends on how carefully you avoid any volatility. Any discrepancies in time stamps, names, paths or compression parameters could lead to ambiguous output.

Norman Megill

unread,
Feb 1, 2020, 12:36:19 PM2/1/20
to Metamath
On Saturday, February 1, 2020 at 11:39:04 AM UTC-5, vvs wrote:
So the zip command apparently does not produce an identical compressed file even if the contents are the same.

They never meant to be reproducible. You should use something like torrentzip for such purpose.


Do you mean ziptorrent?  apt-get doesn't find torrentzip.
But when I did

$ sudo apt-get install ziptorrent
...
$ man ziptorrent
NAME
       ziptorrent - torrentzip zip archives
SYNOPSIS
       ziptorrent [-hnVv] archive [...]
...
$ cp -p metamath-program.zip a.zip
$ ziptorrent a.zip
Segmentation fault
(Doesn't give me a whole lot of confidence...)

Any suggestions?
Norm

vvs

unread,
Feb 1, 2020, 1:04:34 PM2/1/20
to Metamath
Do you mean ziptorrent?  apt-get doesn't find torrentzip.

I meant https://sf.net/p/trrntzip, though this thing has many forks around Internet, including GitHub. And it's an implementation of zip archiver which produced reproducible files. Also, there is https://sf.net/p/t7z which does similar thing but for 7z archives.

But when I did

$ sudo apt-get install ziptorrent
...
$ man ziptorrent
NAME
       ziptorrent - torrentzip zip archives
SYNOPSIS
       ziptorrent [-hnVv] archive [...]
...
$ cp -p metamath-program.zip a.zip
$ ziptorrent a.zip
Segmentation fault
(Doesn't give me a whole lot of confidence...)

Sure. Most Linux distributions claiming that they have thousands of packages are in fact ridden with thousands of bugs.
 
Any suggestions?

Several choices: try to find an alternative and compile it yourself, fix that buggy package or switch to another distribution. From my experience there are little chances that you can get them to fix it for you. I recall that sourceforge projects above worked for me at the time, but YMMV.

heiph...@wilsonb.com

unread,
Feb 1, 2020, 11:49:53 PM2/1/20
to meta...@googlegroups.com
> Could .gz or .bz2 be used instead? I think those are stable.

Either of those would be great. Typically, bzip2 wins on compression and gzip
on speed. In this case, bzip2 is probably a good choice.

In the off chance it's helpful, here are the standard command line invocations
on unix for creating bzip2 and gzip archives of some directory:

$ tar -cjf new-bzip2-archive.tar.bz2 path/to/contents
$ tar -czf new-gzip-archive.tar.gz path/to/contents

The -c flag "creates" an archive, the -j and -z flags compress with bzip2
and gzip respectively. The -f flag specifies the archive path.

And just in case the use of tar seems mysterious, the reason we need it here is
because bzip2 and gzip are simply compression formats, meaning they only work
on single files. So we use tar to first "archive" a collection of paths into a
single file and compress the result. This is a common enough operation that tar
simply provides convenience flags that do the wrapping for us.

Thank you for caring about us package maintainers!

vvs

unread,
Feb 2, 2020, 9:52:30 AM2/2/20
to Metamath
I've got the latest svn r9 from by running

$ cd trrntzip
$ sh autogen.sh
$ ./configure
$ make CFLAGS="-Wno-error=format-security"
$ cd src

After that I've downloaded the latest metamath.zip and ran

$ ./trrntzip metamath.zip

Everything worked fine. Note that it normalized everything inside the resulting zip archive, i.e. you'd get all time stamps changed to a canonical value.

Norman Megill

unread,
Feb 2, 2020, 11:11:26 AM2/2/20
to Metamath
On Saturday, February 1, 2020 at 11:49:53 PM UTC-5, heiphohmia wrote:
> Could .gz or .bz2 be used instead?  I think those are stable.

I tested it, and tar.bz2 seems stable but tar.gz changes its md5sum each time I run tar -czf.  That is very surprising because I'm almost positive that tar.gz was (empirically) stable a few years ago.  I wonder whether we can depend on tar.bz2 being stable in the long run.

Norm
 

Norman Megill

unread,
Feb 2, 2020, 11:14:26 AM2/2/20
to Metamath
Thanks.  I did get trrntzip to work yesterday from the source files, although there was a lot of trial and error that I'm not sure I can reproduce. :)  (SourceForge doesn't like wget, so I ended up downloading through a browser in Windows.)  So I'll keep your instructions for the next time I have to rebuild the server.

While I understand the purpose of normalizing the dates, the site build script makes sure that the dates don't change unless there was an actual update to the metamath program.  It's just that zip doesn't produce a stable .zip file even if the contents are exactly the same with exactly the same date/time stamps.

The question is, are dates important?  If they are, an alternate method is for me to modify the script to test for any file changes, and if none don't regenerate the .zip file but reuse the old one.

Norm

vvs

unread,
Feb 2, 2020, 11:31:50 AM2/2/20
to Metamath
The question is, are dates important?  If they are, an alternate method is for me to modify the script to test for any file changes, and if none don't regenerate the .zip file but reuse the old one.

Keep in mind that you've already lost all dates in a git repository - they are fresh new with each checkout.

André L F S Bacci

unread,
Feb 2, 2020, 11:51:33 AM2/2/20
to meta...@googlegroups.com
On Sun, Feb 2, 2020 at 4:14 PM Norman Megill <n...@alum.mit.edu> wrote:
The question is, are dates important?  If they are, an alternate method is for me to modify the script to test for any file changes, and if none don't regenerate the .zip file but reuse the old one.

Probably yes. As this thread started because a filename was changed because of what was thought as some internal change.

For to detect changes before regenerating, I suggest:

find . -type f -print0 | xargs -0 sha1sum | sort | sha1sum

André

heiph...@wilsonb.com

unread,
Feb 2, 2020, 8:50:35 PM2/2/20
to meta...@googlegroups.com
> I tested it, and tar.bz2 seems stable but tar.gz changes its md5sum each
> time I run tar -czf.

Hrm. Both Nix and Guix rely on the fact that tar.bz2, tar.gz, and tar.xz can be
made deterministic. However, this is starting to do down the reproducible
builds rabbit hole, which is deep.

Here is a good quick read on creating stable, deterministic archives:
https://reproducible-builds.org/docs/archives/

In particular, it suggests the following tar command might work for our
purposes:

$ tar --sort=name \
--mtime="1970-01-01 00:00Z" \
--owner=0 --group=0 --numeric-owner \
--pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
-cjf product.tar.bz2 path/to/source

where the date given to --mtime can be anything that seems appropriate. Another
option suggested is to post-process the non-determinism out of an archive.
Apparently, this can be done with zips as well:

https://packages.debian.org/sid/strip-nondeterminism

Without knowing more about your particular setup, I am afraid that's about all
I can suggest.

Norman Megill

unread,
Feb 3, 2020, 3:33:54 PM2/3/20
to Metamath
On Sunday, February 2, 2020 at 8:50:35 PM UTC-5, heiphohmia wrote:
> I tested it, and tar.bz2 seems stable but tar.gz changes its md5sum each
> time I run tar -czf.

Hrm. Both Nix and Guix rely on the fact that tar.bz2, tar.gz, and tar.xz can be
made deterministic. However, this is starting to do down the reproducible
builds rabbit hole, which is deep.

For the record, here is what I observed:

$ ls -ld metamath          [ <- the standard metamath subdirectory]
drwxr-xr-x 2 norm adm 4096 Nov  1 23:30 metamath
$ tar -czf a.tgz metamath
$ mv a.tgz b.tgz
$ tar -czf a.tgz metamath
$ md5sum a.tgz b.tgz
48ab49f2e3309508809662d2e28327b8  a.tgz
f14761124cfdf55426b8d7bbcf8cff40  b.tgz
$ tar -cjf a.tbz2 metamath
$ mv a.tbz2 b.tbz2
$ tar -cjf a.tbz2 metamath
$ md5sum a.tbz2 b.tbz2
1c6419ac596b2342635469bfcd6fcbd6  a.tbz2
1c6419ac596b2342635469bfcd6fcbd6  b.tbz2

$ tar --version
tar (GNU tar) 1.27.1  ....
$ gzip --version
gzip 1.6    ....
$ bzip2 --version
bzip2, a block-sorting file compressor.  Version 1.0.6, 6-Sept-2010.  ....



Here is a good quick read on creating stable, deterministic archives:
https://reproducible-builds.org/docs/archives/

In particular, it suggests the following tar command might work for our
purposes:

    $ tar --sort=name \
          --mtime="1970-01-01 00:00Z" \
          --owner=0 --group=0 --numeric-owner \
          --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
          -cjf product.tar.bz2 path/to/source

where the date given to --mtime can be anything that seems appropriate. Another
option suggested is to post-process the non-determinism out of an archive.
Apparently, this can be done with zips as well:

https://packages.debian.org/sid/strip-nondeterminism

Without knowing more about your particular setup, I am afraid that's about all
I can suggest.

Thanks for your suggestions.  I will consider them when coming up with a permanent solution.

In the meantime, I have stopped regenerating http://us.metamath.org/downloads/metamath-program.zip each site build.  Until I implement an automated solution, I will create it by hand whenever I update the program.  So you should be able to depend on it being stable from now on.

BTW I took out the Windows executable metamath.exe from metamath-program.zip since it's not relevant to Linux.

Norm

heiph...@wilsonb.com

unread,
Feb 4, 2020, 6:40:46 AM2/4/20
to meta...@googlegroups.com
> In the meantime, I have stopped regenerating
> http://us.metamath.org/downloads/metamath-program.zip each site build.

Thank you. This is very much appreciated.

However, I feel bad that this ended up simply creating mork work for you. If
there is any way I can help automate things, please let me know. In a previous
life, I was a sysops engineer.

> For the record, here is what I observed: ...

This is, indeed, surprising. On my machine, these same steps produce
identically-hashing contents. The discrepency bothered me enough that I spent
some time sleuthing the root cause.

The quick 'n dirty is that gzip 1.6 produces non-deterministic builds by
default. I was able to reproduce the behaviour you demonstrate by using gzip
1.6. The non-deterministicity is fixed in gzip version 1.10 (dated from
2018-12-29).

The medium-length explanation is that gzip includes a timestamp in its header
by default. Up until version 1.9, when the input file is stdin this timestamp
becomes the current system time. Version 1.10 chooses to simply elide the
timestamp altogether in this case. However, when input is a normal file, all
versions, including 1.6, behave similarly by setting the timestamp to the
original file's modification time (mtime).

It turns out that you can manually tell gzip to elide the timestamp in the
latter case by providing the non-obviously named '-n' (--no-name) option. This
means, that with version 1.6, you should be able to get reproducible tar.gz
archives by manually chaining tar and gzip together:

$ tar -cf - path/to/metamath | gzip -n >metamath.tar.gz

or telling tar to chain for us with the -I switch:

$ tar -I 'gzip -n' -cf metamath.tar.gz path/to/metamath

As it turns out, either of those invocations produce archives that hash
identically between versions 1.6 and 1.10.


And just for over-the-top kicks, if anyone is interested in checking out the
commit that introduced the above change to gzip, here it is:

url: git://git.savannah.gnu.org/gzip.git
commit: bce795d0a38ae10f13b3297f1253acdeb4defc21

Cheers.
Reply all
Reply to author
Forward
0 new messages