Switching the set.mm repo from Travis CI to GitHub Actions (or something else)

97 views
Skip to first unread message

David A. Wheeler

unread,
Nov 18, 2020, 9:33:13 PM11/18/20
to Metamath Mailing List
All:

I propose that we move from using Travis CI to something else, probably GitHub Actions, for the set.mm repo. This is proposed here:
https://github.com/metamath/set.mm/issues/1867

I intend to work to implement this, though help is always welcome. Details below.

--- David A. Wheeler

=== DETAILS ===

One great thing about the set.mm repository is that every proposed change is examined by multiple checking tools (including a number of verifiers). This helps provide extraordinarily high confidence that the proofs follow from the given axioms & definitions. While it focuses on the set.mm database, there are also checks on set.mm (based on intuitionistic logic). This checking of each potential change is a normal process in software development, and is often called a continuous integration (CI) pipeline. The CI pipeline for the set.mm repository is implemented using Travis.

Unfortunately Travis has become increasingly unreliable, showing as increasing delays or occasional unexpected failures. There are also reasons to think we may not be able to use it in the future. Travis has been having a number of problems as a company; those who are curious about more can look at these:
https://www.jeffgeerling.com/blog/2020/travis-cis-new-pricing-plan-threw-wrench-my-open-source-works
https://blog.travis-ci.com/2020-11-02-travis-ci-new-billing
https://news.ycombinator.com/item?id=25003387

I propose switching to something else. The “obvious” answer is GitHub actions. It’s free for OSS (and we are), and we’re already on GitHub. This basically requires creating a new .yml file to implement it. GitHub actions trivially supports parallel jobs.

I intend to start implementing this. Someone else had done a first step at that, so I’ll certainly look at that.

Mario would like to see fewer verifiers run on *each* commit, and instead run the “rest” of the verifiers separately (e.g., nightly). I’m skeptical that will save anything we care about, and doing that will complicate things. More importantly, trying to make that change *and* simultaneously moving our jobs from Travis to GitHub Actions is more complex.

So I intend to move the Travis jobs to GitHub Actions *first*, while doing my best to isolate the jobs. We can then *separately* consider if we should split jobs into “nightly” and “per-commit” jobs, and how to split them, once we have confidence that the jobs have been correctly moved. It’ll be much easier to discuss trade-offs & implementation difficulty once we have our CI pipeline moved to a more reliable service.

Mario Carneiro

unread,
Nov 18, 2020, 9:50:02 PM11/18/20
to metamath
> So I intend to move the Travis jobs to GitHub Actions *first*, while doing my best to isolate the jobs. We can then *separately* consider if we should split jobs into “nightly” and “per-commit” jobs, and how to split them, once we have confidence that the jobs have been correctly moved. It’ll be much easier to discuss trade-offs & implementation difficulty once we have our CI pipeline moved to a more reliable service.

As I said in the issue, I think this is the right approach. Besides the boilerplate associated with setting up CI, I don't think there is any complexity to the current travis script - it's just a list of commands that could just as well be in a bash script and called from anywhere. The details of triggers on GA are a bit different but it's quite flexible and I don't think supporting a separate nightly build is a huge extension so much as a rearrangement of lines.

That said, I'm personally pretty procrastinatory about this stuff - I've been putting off setting up MM0 CI via github actions for months, which I'm sure would be valuable experience here - so it mostly just comes down to looking up the API and making the switch.

Mario

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/93F3FE08-A42B-48F3-BD9A-1FD489FE3C87%40dwheeler.com.

Jim Kingdon

unread,
Nov 19, 2020, 1:47:23 AM11/19/20
to meta...@googlegroups.com
All sounds good, thanks for looking into this.

I'd say that running everything on every pull request would be my
preference, and I don't think it takes that long to run (at least, it
hasn't on travis). Nightly is an option if we have to, but if the
nightly run fails there is a debugging step of figuring out which of the
pull requests merged that day caused the problem.

Mario Carneiro

unread,
Nov 19, 2020, 3:26:29 PM11/19/20
to metamath
On Thu, Nov 19, 2020 at 1:47 AM Jim Kingdon <kin...@panix.com> wrote:
All sounds good, thanks for looking into this.

I'd say that running everything on every pull request would be my
preference, and I don't think it takes that long to run (at least, it
hasn't on travis). Nightly is an option if we have to, but if the
nightly run fails there is a debugging step of figuring out which of the
pull requests merged that day caused the problem.

If a nightly run fails, then that means there is a buggy verifier, and today's test sussed it out. It doesn't really matter what happened that day in set.mm, because the CI verifier should catch any errors in the relevant commit.

There are two possibilities:
* One of the large list of nightly verifiers is buggy and is spuriously failing on the correct set.mm of that day. In that case we should contact the author, try to get it fixed, and probably remove it from the list of nightly verifiers until it is fixed.
* The CI verifier is buggy and failed to catch a problem that was introduced some time that day in set.mm, and the nightly verifiers (probably almost all of them) are flagging it. This is also not too hard to localize, because at least some of the verifiers have good error reporting and will point at the particular proof that is going wrong. In any case, this isn't something where we need to chase the set.mm commit, because it doesn't matter - we've just generated a new test case for the verifier, so we should fix the CI verifier, get it passing or failing properly on the new info, and then fix set.mm if it turns out to be broken.

This is why I say that the nightlies are really a test case for the verifiers - under no situation is the blame laid on the commit that pushed a bad proof, because disagreement between verifiers is always a verifier bug. All the per-commit blame for bad proofs is delivered by the CI verifier that runs on every commit.

Mario

David A. Wheeler

unread,
Nov 19, 2020, 3:41:18 PM11/19/20
to Metamath Mailing List

> On Nov 19, 2020, at 3:26 PM, Mario Carneiro <di....@gmail.com> wrote:
> This is why I say that the nightlies are really a test case for the verifiers - under no situation is the blame laid on the commit that pushed a bad proof, because disagreement between verifiers is always a verifier bug. All the per-commit blame for bad proofs is delivered by the CI verifier that runs on every commit.

Some verifiers check not only for the proof, but for certain conventions that we use.
If there *is* a disagreement, it’s useful to know as soon as possible.

In any case, let’s get the jobs migrated first. We can then discuss if it’s worth trying to have a separate “nightly”.

--- David A. Wheeler

Mario Carneiro

unread,
Nov 19, 2020, 3:46:54 PM11/19/20
to metamath
On Thu, Nov 19, 2020 at 3:41 PM David A. Wheeler <dwhe...@dwheeler.com> wrote:

> On Nov 19, 2020, at 3:26 PM, Mario Carneiro <di....@gmail.com> wrote:
> This is why I say that the nightlies are really a test case for the verifiers - under no situation is the blame laid on the commit that pushed a bad proof, because disagreement between verifiers is always a verifier bug. All the per-commit blame for bad proofs is delivered by the CI verifier that runs on every commit.

Some verifiers check not only for the proof, but for certain conventions that we use.
If there *is* a disagreement, it’s useful to know as soon as possible.

In my vision for the CI verifier, it checks everything that CI currently checks: line lengths, style, htmldef entities, $j commands, comment parsing, definition checking, etc in addition to being a plain old verifier. So if *any* nightly verifier reports an error that CI missed then there is a bug in the verifier suite somewhere, not in the input files.

Mario

Mario Carneiro

unread,
Nov 19, 2020, 3:58:15 PM11/19/20
to metamath
Without writing a new verifier, the cheap way to obtain this is to run the minimal subset of existing verifiers sufficient to get all these auxiliary checks: I believe metamath.exe and mmj2 should be sufficient, although mm-scala does some $j checking beyond mmj2 so it might also have to be included. But ultimately I would like to have all of this in one verifier because it's faster and it would also make a good foundation for the next generation proof assistant if it is able to have a solid semantic understanding of metamath files. (Imagine if renaming could be done correctly without relying on grep!) I've been doing a significant amount of work on mm0-rs to turn it into just such a program for MM1, and it is nice to be able to have all the niceties of a normal programming language IDE like go to definition, find references and rename in addition to proof assistant like features. I'm not sure I want to go back to mmj2 at this rate.

Benoit

unread,
Nov 20, 2020, 9:37:09 AM11/20/20
to Metamath
Mario, do you have some more detail about this project to write (I'm quoting you from github) "a verifier that would satisfy all the constraints of the CI system in terms of capabilities: checking all the typesetting stuff, definition checking, and full verification as efficiently as possible"?
Thanks in advance.
Benoît

Mario Carneiro

unread,
Nov 20, 2020, 3:00:22 PM11/20/20
to metamath
The plan is to start a new "community verifier" and build it with those tasks in mind. I think there has been too much fragmentation in verifier land; this makes it very hard for newcomers, especially those that are not especially computer literate such as traditional mathematicians. mmj2 is pretty good, but it's also quite old and the architecture isn't something I really want to continue to build on. MM1 builds a lot of lean-esque features into a metamath-like foundation, and I would say as a proof assistant it's a smashing success, in usability if perhaps not in users. I've been spoiled by lean to an extent and don't really see current metamath tools as up to par by comparison, and building mm0-rs has been enough to convince me that it is feasible to do the same thing for metamath.

I value metamath's many verifiers; it's a great thing that a new verifier can be done as an afternoon project. But a proper proof assistant needs a lot more infrastructure to provide a high quality user experience, and the community is small enough as it is without additional fragmentation. We've managed to successfully collaborate on the set.mm github repo, but to date no verifier has received significant attention from >2 contributors, except mmj2 which has a couple drive-by contributions beyond the work by myself and Mel O'Cat (and I'm not actively developing the project anymore). There is no metamath verifier I would say is in a particularly healthy state of development.

As a starting point, I would like to replicate the main functions of metamath.exe, at least for one-shot tasks. That means reading and verifying a database, but also parsing it in such a way that edits can be applied, comments re-wrapped, and semantic information like document structure is available. Generating the web pages would also be a target. (This is quite difficult, as metamath.exe is a ball of hacks right now and I would like to start from a cleaner baseline.)

For the more advanced tasks, one design question that comes up early is whether to support "ungrammatical" databases. On the one hand, the metamath spec considers such databases valid, so they can't be rejected (unless the database contains the $j command asserting that it is grammatical). On the other hand, grammaticality enables an asymptotically more efficient structural storage (trees instead of strings), and also makes it possible to perform HTML construction in a saner way, where expressions are properly grouped rather than just appending tokens and hoping it looks well formed at the end. (This is where a lot of the hacks in metamath.exe come from.) Note that set.mm and iset.mm are grammatical, as well as most but not all of the smaller example databases. (I think miu.mm and lof.mm are the main counterexamples.) Perhaps we can degrade to a simpler verifier-only version in case the DB is not grammatical.

As far as language choice goes, I would recommend Rust. Beyond being well suited for the speed and architectural requirements of a verifier / proof assistant, both myself and Raph Levien (if I can interest him in this project) are involved in the rust ecosystem; maybe we can use the Druid UI framework for the front end. (I want to put in a word of caution though: one thing I learned from mmj2 is that it's a bad idea to write a text editor unless that's the main goal. This is not a trivial task and will suck up all your research time. Outsourcing the editor work for MM1 to VSCode made so many things easier.)

But really, the choice is not only up to me, as community buy-in for this project would be important. The ultimate goal is to become the de-facto standard proof assistant for metamath, put all our best ideas in it, and finally get over the popular conception of metamath as not practical for real mathematicians unless you have a high tolerance for manual labor.

Mario

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.

Mario Carneiro

unread,
Nov 20, 2020, 3:04:02 PM11/20/20
to metamath
Actually I've reposted this at https://groups.google.com/g/metamath/c/6fQ_GxnPAWI/m/wJ8wzDp3AQAJ; please direct future responses to that thread so we can keep this one to Travis -> GH Actions.

David A. Wheeler

unread,
Nov 20, 2020, 5:51:33 PM11/20/20
to Metamath Mailing List
On 11/18/20 6:33 PM, David A. Wheeler wrote:
> Unfortunately Travis has become increasingly unreliable... I propose switching to something else. The “obvious” answer is GitHub actions.

On Nov 19, 2020, at 1:47 AM, Jim Kingdon <kin...@panix.com> wrote:
>> All sounds good, thanks for looking into this.
>> I'd say that running everything on every pull request would be my preference, and I don't think it takes that long to run (at least, it hasn't on travis). Nightly is an option if we have to, but if the nightly run fails there is a debugging step of figuring out which of the pull requests merged that day caused the problem.

Good news, the switch from Travis CI to GitHub Actions is implemented in this pull request:
https://github.com/metamath/set.mm/pull/1879
I’m waiting for Norm to approve it & merge it; I don’t know any reason he’d object.

This change implements all the same checks as the Travis CI pipeline (it’s basically the same code).
I did the initial conversion; Mario then sped up the smm3 (Rust) process by adding a cache.

Just like the Travis version, all verifiers run in parallel. A sample run’s timing was:
* 47 seconds (checkmm, C++)
* 69 seconds (metamath.exe, C, does extra checks)
* 14 seconds (smm3, Rust *when cached*). Note: it's 111 seconds without a cache
* 102 seconds (mmj2, Java, does extra checks)
* 89 seconds (mmverify.py, Python, set.mm body)
* 73 seconds (mmverify.py, Python, set.mm mathboxes)
* 16 seconds (mmverify.py, Python, iset.mm)

Since they’re parallel, verification by *all* them is the same as the slowest one, plus sometimes a few seconds to wait for spare CPUs. So it takes less than 2 minutes to complete.

A few comments about these numbers are probably in order. The Rust one is cached in these numbers (14 seconds). Without caching Rust is the *slowest* because of compilation time; one sample was 111 seconds per <https://github.com/metamath/set.mm/runs/1428848217> We can shave off 30 seconds in metamath.exe by caching its compiled result, and 7 seconds by checkmm by caching its compiled result. We’ll probably do that at some point, though there’s no rush.

I agree with Jim Kingdom & think there's no need to move anything to a "nightly" run:
* These all run in parallel. The longest one (mmj2) has a sample time of 1 minute 42 seconds. I think that's fine to keep running all in parallel, we aren't in *that* much of a rush to merge things. I don't think there's a problem if it takes less than 2 minutes to run all the checks.
* The longest-running one (mmj2) does added checks (for our definition conventions) that most other verifiers don't do. So it's not true that omitting mmj2 will have identical results.
* This doesn't really impact the environment. We're talking seconds, and a lot of this is waiting (e.g., for checkouts & downloads). It'd be impractical to try to measure the "extra" energy used. Adding the two caches would be a nice touch & reduce CPU time some, and that's easy to do, so we should do that eventually. But I don't see a reason to go beyond that.

That said, if we *do* want to move to nightly checks, we’re now better positioned to do it.

--- David A. Wheeler

Mario Carneiro

unread,
Nov 20, 2020, 6:04:30 PM11/20/20
to metamath
Mainly I would like to remove mmverify.py and mmj2 from the pipeline, which are far more expensive than the others for the same job. Currently we can't remove mmj2 because it's the only one with a definition checker but that does not cost 102 seconds to do by any stretch. And I notice that mmverify.py is over 2 minutes total, requiring some manual parallelization. We don't really need to be building our dependencies; I initially intended to download a compiled executable from smetamath-rs repo but there is no such artifact on the repo and caching does just as well. The python version of course can't be sped up much by compilation, and we're already downloading the compiled mmj2.jar, but metamath.exe and checkmm can be precompiled as well.

In any case, I'm fine with merging this as-is and circling back to other possibilities later.

Mario

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.

David A. Wheeler

unread,
Nov 22, 2020, 5:34:57 PM11/22/20
to Metamath Mailing List
The Metamath set.mm repository has now switched from Travis CI to GitHub Actions for the CI pipeline.

Early experience by @avekens is positive; it’s MUCH faster & has easier navigation <https://github.com/metamath/set.mm/pull/1886>.

One quirk: There are many jobs, and normally half of them skipped. We want to ensure that we run a check on every push inside the repo *and* pull request to the repo,. The only way I found to do that was to create jobs for both cases and skip execution when it’s a duplicate. This causes a few seconds’ delay (as it works out what jobs to run) & we see lots of skipped jobs, but that’s a price worth paying... we run the CI pipeline whenever it’s needed *without* doing a lot of duplicate work.

--- David A. Wheeler

Reply all
Reply to author
Forward
0 new messages