> I am currently creating a comparison of large libraries of ITPs. I am only considering libraries with more than 1M (nonempty) LOC, of which I know (in alphabetical order): ACL2, HOL4, Isabelle, Lean, Metamath, Mizar, PVS, Rocq.
> To that end, I want to approximate the number of definitions (incl. axioms; excl. abbreviations, instances, fixed values/constants, examples) and proven proper theorems (excl. automatically generated ones) by simple analysis of the sources (without running the system).
Quick clarification: I didn't actually uncompress these databases (
set.mm etc.)
and measure their uncompressed length. That said,
I'm pretty confident that uncompressed
set.mm and
iset.mm would be >1M just
from eyeballing them.
I'm not as sure about
nf.mm and
ql.mm, though that's plausible.
If it's vital, I could uncompress and measure them. However, there's a *reason*
we normally work with compressed proofs :-). Most other systems like Lean and Isabelle
are happy with statements like "blast" that say "go rediscover the proof" without
actually showing the steps. A fundamental differentiator about
Metamath is that it *never* skips proof steps, so its proofs have steps others
would skip over. That doesn't mean we hate other systems, but it's important
to understand that is a key differentiator: *all* steps of a proof are always included.
By "uncompress" I simply mean the sequence of statements used in each proof.
If we truly uncompressed the proofs to show what humans can consider
(that is, showing the resulting statement after each proof step), all 4 of those
databases would *trivially* clear 1M lines, even if we skipped the syntax ("non-essential")
proof steps that are in the database.
(This is a resend, sorry if you got this twice.)
--- David A. Wheeler