Thank you very much for the answer.
Let us see if I understand this:
Let us assume I take the 619 different subtheorems used in 2p2e4 (I get this number, 619, by doing SHOW TRACE_BACK 2p2e4 /COUNT_STEPS). These 619 subtheorems can be listed by doing show trace_back 2p2e4. The total number of steps of these 619 subtheorems is 12178 (again, this number is obtained by doing SHOW TRACE_BACK 2p2e4 /COUNT_STEPS).
So, is it correct to say the following:
- I can create a new file, let us call it
set_simplified.mm, which corresponds to taking
set.mm, and only keeping the 619 subtheorems used in the proof of 2p2e4.
- I can run python3 mmverify.py
set_simplified.mm --logfile set.log to check that the proof is correct. mmverify.py has about 690 lines.
So, with 12178 steps, plus 690 lines, I can get a full proof of 2p2e4. All these steps could be written in a pdf file (long, but not unreasonably long).
Is this correct? Or maybe the way I have defined
set_simplified.mm is "too simplified"?
Yes, that is basically correct. Strictly speaking though it is "too simplified", because it doesn't account for statement types other than $p / $a :
* ${ , $} - obviously we need these for grouping a statement with its hypotheses
* $e - (hypotheses) these are "attached" to each theorem and usually directly precede them, although sometimes they can be in the same scope as an earlier theorem and shared across multiple theorems
* $d - (disjoint variable conditions) these are also hypothesis-like in their scoping pattern and semantics
* $v, $f - (variable declarations) for
set.mm these are basically global, but there is still a well defined notion of "use" of a variable: just look for its token in any of the 619 subtheorems. If you see it then it must be declared, otherwise it can be dropped.
* $c - (constant declarations) These are global, and have the same kind of "use" as variables: if you see it in the $p ... $= or $a ... $. text then it must be declared first.
For all these reasons, you probably want to use a "metamath aware" slicer for constructing these *_
simplified.mm databases. But this is all very straightforward and will not add much on top of the 619 theorems you identified. None of these count as "steps" though toward the 12178 step count; these are for proof steps (I don't know whether syntax steps are included or omitted in the count), and the best you can say about its relation to the
simplified.mm file is that every step requires at least one character so
set_simplified.mm should be at least 12178 characters and probably more because of theorem statements, delimiters, formatting, and comments if you are keeping them.
In general, no this process will not produce long proofs. In fact it's about as short as we know how to make them, since this is taking full advantage of metamath's abstraction and reuse mechanisms, and the library itself is optimized for overall proof size. However you can find some inefficiencies in this process because
set.mm is trying to prove more than just 2+2 = 4 so it will prove things in "unnecessary" generality that becomes pure overhead once you cut literally everything else out. So it's likely that after extracting such a database you can run minimizers over the code and get still more savings.