metamath-docker and profiling checkmm.cpp

Antony Bartlett

unread,

Oct 13, 2025, 4:24:00 PMOct 13

to meta...@googlegroups.com

I have succeeded in my goal of making checkmm.cpp faster than checkmm.ts. Here are my latest benchmarks:

Benchmark 1: metamath-knife --verify set.mm

Time (mean ± σ): 574.7 ms ± 5.3 ms [User: 556.7 ms, System: 18.0 ms]

Range (min … max): 569.7 ms … 583.6 ms 10 runs

Benchmark 1: echo "LoadFile,set.mm" > params.txt && echo "VerifyProof,*" >> params.txt && mmj2 -f params.txt

Time (mean ± σ): 3.145 s ± 0.042 s [User: 6.991 s, System: 0.085 s]

Range (min … max): 3.093 s … 3.231 s 10 runs

Benchmark 1: metamath "READ set.mm" "VERIFY PROOF *" "exit"

Time (mean ± σ): 3.310 s ± 0.051 s [User: 3.281 s, System: 0.028 s]

Range (min … max): 3.259 s … 3.413 s 10 runs

Benchmark 1: checkmmc set.mm

Time (mean ± σ): 5.161 s ± 0.098 s [User: 5.134 s, System: 0.026 s]

Range (min … max): 5.067 s … 5.308 s 10 runs

Benchmark 1: checkmm set.mm

Time (mean ± σ): 5.980 s ± 0.049 s [User: 7.133 s, System: 0.163 s]

Range (min … max): 5.896 s … 6.063 s 10 runs

Benchmark 1: python3 mmverify.py set.mm

Time (mean ± σ): 14.857 s ± 0.153 s [User: 14.818 s, System: 0.038 s]

Range (min … max): 14.759 s … 15.281 s 10 runs

Produced with the collection of Metamath command line tools in Docker which I maintain.

I thought I'd made a mistake when I said metamath.exe was the fastest single-threaded verifier the other day, because when I double-checked, mmj2 has always been slightly faster. However, judging by how it fits nearly seven seconds of user run-time into a total run time of 3.1 seconds, I'd suggest it must be multi-threaded. So by a complete fluke, I may have been correct about metamath.exe!

There's a performance penalty in using Docker, but I've always felt that was OK so long as it penalises all verifiers equally. However, a closer look at my previous benchmarks show this is not the case, unlike the other verifiers checkmm.cpp was taking more than a whole second of system time, which is where all the virtualisation that slows Docker down is likely to be. Unbeknown to me when I started the Dockerization project, this is a quirk of the Alpine Linux base container that I was using. The workhorse of the Cloud owing to how slim it is, but without glibc support (it uses something called musl instead, which is known to perform worse).

So most of what I've done since my first round of optimization is to move metamath-docker from Alpine to Ubuntu, which is why some of the other benchmark numbers look a bit different too. I think it's a good solid choice of distro and better reflects the sort of environment where people are likely to be running verifiers.

I have also managed to make a few small performance improvements to the C++ code, which can be found here.

So in the second part of this email I wanted to talk about profiling - which was discussed here briefly last week. I'm a beginner myself at this, because, in spite of being a software developer for long enough that I should probably start downplaying some of my experience on my resume, the occasions where we've said "yeah this is great, but we wish it was faster" are actually pretty scarce. I said profiling was important when optimising, because otherwise how do you know if you're optimising in the right place? And indeed I did initially embark on this without profiling and did optimise in the wrong place, I should have known better.

Unfortunately profiling tools seem to be one of the few things left which are operating system specific. In the past I've used whatever comes with Microsoft Visual Studio on Windows. This time I'm using 'sample' on MacOs. In hindsight I should have used 'perf' or similar inside of my Docker container, so others could follow along and join in more easily if they wished. The commands I used for building and profiling can be here.

Fortunately the output of profiling tools is pretty similar in terms of how you skim it and figure out what's slow.

This profile for the original checkmm.cpp source code goes straight down into std::map::find and from there to string comparison. As I said in my original email about optimising checkmm.cpp, std::map is ordered and the O(log n) lookups are what was taking the time, but here's the evidence of that.

The latest profile jumps straight into std:copy, and I do think there's a slightly more efficient way of doing that. After that it's doing stuff with unordered_map which I'm happy about. Then pushing strings into vectors which in the context of makesubstition sounds reasonable. Then more std::copy

So I think there's still a little bit of scope left for improving the performance of checkmm.cpp without a substantial redesign, but I'm very happy to have hit my goal of making it faster than checkmm.ts. I hope this has been of interest,

Best regards,

Antony

Mario Carneiro

unread,

Oct 23, 2025, 8:16:33 AMOct 23

to meta...@googlegroups.com

On Mon, Oct 13, 2025 at 10:24 PM Antony Bartlett <a...@akb.me.uk> wrote:

I thought I'd made a mistake when I said metamath.exe was the fastest single-threaded verifier the other day, because when I double-checked, mmj2 has always been slightly faster. However, judging by how it fits nearly seven seconds of user run-time into a total run time of 3.1 seconds, I'd suggest it must be multi-threaded. So by a complete fluke, I may have been correct about metamath.exe!

Nope, mmj2 is single-threaded, as is basically every verifier. metamath-knife / smm3 is the only verifier that has attempted to be multithreaded, it is quite complicated to do correctly. (That's not to say that it is only using one core; the Java runtime environment itself is multithreaded, e.g. in the garbage collector, so that may explain some parallelism. But mmj2 source code does not attempt to do anything with threads except to run the UI.)

David A. Wheeler

unread,

Oct 23, 2025, 3:57:40 PMOct 23

to Metamath Mailing List

> Antony Bartlett <a...@akb.me.uk> wrote:
> I have succeeded in my goal of making checkmm.cpp faster than checkmm.ts.

Congrats! Added speed is always nice, and it takes work to improve performance.

--- David A. Wheeler

Reply all

Reply to author

Forward