metamath-test added to metamath-cmds docker image

29 views
Skip to first unread message

Antony Bartlett

unread,
Mar 16, 2024, 5:59:15 PMMar 16
to meta...@googlegroups.com
metamath-cmds is a docker image where I've collected a number of metamath command line tools together for convenience (metamath.exe, metamath-knife, mmj2, checkmm, checkmm-ts, mmverify.py).

Today I have updated this image, so if you run

docker run -it akb74/metamath-cmds

you will be in a command line which has all the latest versions of these tools and the latest set.mm.  However, the last time I did this was two years ago, so most of the time you'd be better off following the instructions in the github repository


Today I have also added metamath-test (https://github.com/david-a-wheeler/metamath-test) to the commands available.  This is a useful collection of good .mm files which a verifier is expected to pass, and bad .mm files which a verifier is expected to fail.  I had to patch my copy of the test harness considerably to get it to run in the metamath-cmds container, some of which is due to the environment, but additionally I think the suite itself may be a little out of date?

Finally, I have a question about metamath-knife.  In order for the test suite to pass, you have to set the --verify parameter.  However, if you do that, it seems metamath-knife loses much of its fabled speed.  So I'm wondering in what sense metamath-knife can be considered a sub-one-second verifier if it doesn't pass the test suite in this mode?  Sorry if that sounds impertinent, I'd just like to understand the distinction, I'm sure it's doing the most important checks really fast.

Thanks,

    Best regards,

        Antony

Mario Carneiro

unread,
Mar 17, 2024, 12:00:48 AMMar 17
to meta...@googlegroups.com
As far as I know and last I checked, metamath-knife can check set.mm in sub-one-second with verification enabled, and about half a second with verification disabled. It's possible that things have changed due to the growth of set.mm?

Just checked again:

$ cargo build --release
$ hyperfine -w 2 -- "target/release/metamath-knife -v set.mm"
Benchmark 1: target/release/metamath-knife -v set.mm
  Time (mean ± σ):      1.367 s ±  0.052 s    [User: 1.280 s, System: 0.085 s]
  Range (min … max):    1.307 s …  1.464 s    10 runs
$ hyperfine -w 2 -- "target/release/metamath-knife set.mm"
Benchmark 1: target/release/metamath-knife set.mm
  Time (mean ± σ):     199.7 ms ±  13.7 ms    [User: 151.8 ms, System: 47.4 ms]
  Range (min … max):   185.6 ms … 236.6 ms    15 runs

so a bit more than a second now, and without verification it's even less than I remember, probably because the default profile has more things disabled than there used to be.

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/CAJ48g%2BASabinmU590eC41US0So42YSeuMdbR0GUwenj0P1Fd%2Bw%40mail.gmail.com.

Antony Bartlett

unread,
Mar 17, 2024, 9:44:19 AMMar 17
to meta...@googlegroups.com
Thanks for looking at that, Mario, that's really helpful.  I have added hyperfine to the metamath-cmds container (it's in github, I'll push to dockerhub later).  The 'warm up' functionality is particularly useful as --verify seems particularly susceptible to cold starts.  I've taken some benchmarks (below) and metamath-knife looks completely safe as expected.  Containers of course cannot be relied on for absolute benchmarks - my Windows host system seems to take very roughly twice as long compared to running natively.  My concern when I compared your results with mine was that maybe containers can't be relied on for relative benchmarks either, but these look entirely in line with expectation.  I'm still not clear on the difference between running metamath-knife with --verify and without, but I am very satisfied with this outcome, and grateful for your assistance.

I'm going to try and tidy up the changes to my copy of metamath-test, and maybe get a PR or two into the metamath-test repo too.  There's a regexp that didn't run for me and which I fear I've overly weakened, but if I need some help with that, will worry about it later.  This has all been prompted because I was telling someone on Reddit about how my TypeScript port of checkmm runs faster than the original C++.  In the best possible way they were sufficiently affronted by the idea of TypeScript running faster than C++ that they examined checkmm.cpp and apparently have found a serious performance flaw in it.  This work on the metamath-cmds container means that now I'm all ready to analyse a PR should I get one (otherwise I'll write it myself).


=========================================================================
Metamath command line tools (https://github.com/Antony74/metamath-docker)
=========================================================================

Please find below, an example command for each of the tools provided here:

metamath "READ set.mm" "VERIFY PROOF *" "exit"
checkmmc set.mm
metamath-knife --verify set.mm
echo -e "LoadFile,set.mm\nVerifyProof,*" > params.txt && mmj2 -f params.txt
python3 mmverify.py set.mm
checkmm set.mm
prettier --write demo0.mm
alt-mm set.mm
cd /metamath-test && ./run-testsuite-all-drivers

Prefix with time to measure how long a command took.
Put command in quotes and prefix with hyperfine -w 2 -- to benchmark how long it takes.

/set.mm # hyperfine -w 2 -- 'metamath-knife set.mm'
Benchmark 1: metamath-knife set.mm
  Time (mean ± σ):     354.7 ms ±  97.8 ms    [User: 247.8 ms, System: 101.2 ms]
  Range (min … max):   261.5 ms … 514.6 ms    10 runs

/set.mm # hyperfine -w 2 -- 'metamath-knife --verify set.mm'
Benchmark 1: metamath-knife --verify set.mm
  Time (mean ± σ):      2.351 s ±  0.214 s    [User: 2.151 s, System: 0.200 s]
  Range (min … max):    2.099 s …  2.688 s    10 runs

/set.mm # hyperfine -w 2 -- 'metamath "READ set.mm" "VERIFY PROOF *" "exit" > output.log'
Benchmark 1: metamath "READ set.mm" "VERIFY PROOF *" "exit" > output.log
  Time (mean ± σ):     10.899 s ±  0.163 s    [User: 10.341 s, System: 0.555 s]
  Range (min … max):   10.739 s … 11.244 s    10 runs

/set.mm # hyperfine -w 2 -- 'echo -e "LoadFile,set.mm\nVerifyProof,*" > params.txt && mmj2 -f params.txt'
Benchmark 1: echo -e "LoadFile,set.mm\nVerifyProof,*" > params.txt && mmj2 -f params.txt
  Time (mean ± σ):     10.039 s ±  0.241 s    [User: 22.526 s, System: 1.711 s]
  Range (min … max):    9.479 s … 10.340 s    10 runs

/set.mm # hyperfine -w 2 -- 'checkmm set.mm'
Benchmark 1: checkmm set.mm
  Time (mean ± σ):     15.926 s ±  0.268 s    [User: 19.688 s, System: 1.408 s]
  Range (min … max):   15.589 s … 16.410 s    10 runs

/set.mm # hyperfine -w 2 -- 'checkmmc set.mm'
Benchmark 1: checkmmc set.mm
  Time (mean ± σ):     34.860 s ±  0.949 s    [User: 30.446 s, System: 4.412 s]
  Range (min … max):   34.097 s … 37.491 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

/set.mm # hyperfine -w 2 -- 'python3 mmverify.py set.mm'
Benchmark 1: python3 mmverify.py set.mm
  Time (mean ± σ):     52.228 s ±  0.240 s    [User: 51.274 s, System: 0.953 s]
  Range (min … max):   51.939 s … 52.631 s    10 runs

Mario Carneiro

unread,
Mar 17, 2024, 1:59:05 PMMar 17
to meta...@googlegroups.com
> I'm still not clear on the difference between running metamath-knife with --verify and without, but I am very satisfied with this outcome, and grateful for your assistance.

Running with --verify makes metamath-knife act like a traditional verifier: it checks correctness of the metamath theorems and no more. The default mode of metamath-knife does required processing only, which basically means parsing the file. This is not particularly useful on its own, but because there are lots of other flags with additive behavior this avoids needing to do verification work if other things are requested (e.g. axiom-use / discouraged generation, markup validation, etc).

Reply all
Reply to author
Forward
0 new messages