What does Z mean in compressed proofs?

51 views
Skip to first unread message

Philip White

unread,
May 2, 2022, 1:16:12 PMMay 2
to meta...@googlegroups.com
Hi,

I'm implementing a Metamath verifier, and I am at the point of actually
verifying the proofs. I would like to support compressed proofs so that
I can directly verify set.mm instead of first converting to normal
proof format.

I've been reading the section in the book about how the proof
compression format works. I'm confused by the explanation of the Z
character. Here is a quote from the book:

The letter Z indentifies (tags) a proof step that is identical to one
that occurs later on in the proof; it helps shorten the proof by not
requiring identical proof steps to be proved over and over again
(which happens often when building wff's).

And also:

When the compressed proof is converted to a normal proof, the entire
subproof of a step tagged with Z replaces the reference to that step.

I have two questions:

1. What is a subproof? I thought that uncompressed proofs are just
sequences of labels, so there doesn't seem to be enough structure for
steps to refer to a "subproof".

2. Should the compressed proof verifier skip running steps of a saved
subproof the second time it sees it? Given my first question, I don't
really understand what steps a subproof will contain, so I can't even
tel if skipping the duplicate steps would end up being sound. I would
have expected a verifier to operate on a stream of labels, and then the
uncompressed or compressed verifiers would merely be a fold over those
labels. However, if we skip sub-proofs, that means the compressed
verifier is not just a different representation, but also a different
algorithm.

I would greatly any explanation you can give.

Thanks,
Philip

Mario Carneiro

unread,
May 2, 2022, 1:41:44 PMMay 2
to metamath
Hi Phillip,

On Mon, May 2, 2022 at 1:16 PM 'Philip White' via Metamath <meta...@googlegroups.com> wrote:
I have two questions:

1. What is a subproof? I thought that uncompressed proofs are just
sequences of labels, so there doesn't seem to be enough structure for
steps to refer to a "subproof".

A normal mode proof is essentially a tree of theorem applications written in reverse polish notation (RPN), so the "subproof" corresponding to a given step is all the steps that lead into it. For example if the last step is ax-mp which has four arguments (two wffs and two hypotheses) then the subproof ending at that step is the "ax-mp" step plus the four subproofs that came before it (recursively using this definition to define the extent of those four subproofs).

You could use something like that to convert a proof to normal mode if you wanted, but I don't recommend it for verification, since there is a much simpler and asymptotically far less wasteful way to check the proofs (see next point).

2. Should the compressed proof verifier skip running steps of a saved
subproof the second time it sees it? Given my first question, I don't
really understand what steps a subproof will contain, so I can't even
tel if skipping the duplicate steps would end up being sound. I would
have expected a verifier to operate on a stream of labels, and then the
uncompressed or compressed verifiers would merely be a fold over those
labels. However, if we skip sub-proofs, that means the compressed
verifier is not just a different representation, but also a different
algorithm.

The key to understanding this is to view a normal mode proof not as a tree structure like I mentioned above, but instead as operations on a stack machine. The stack starts empty, and each theorem reference pops n statements from the stack (where n is the arity of the theorem) and pushes one more statement. At the end there should be exactly one statement on the stack, which should match the $p statement. This encodes the RPN structure, so it's really the same thing stated in a different way, but this generalizes better to the compressed case.

The stack elements here are "statements", which are roughly speaking the stuff that can go in the body of a $e or $p ... $= , that is, a token string consisting of a constant (the typecode) followed by zero or more constants or variables. The existence of such an element on the stack represents that in the current theorem context, that expression is provable, so if we get the final $p expression on the stack then we have proven the theorem, and each theorem application is asserting that since certain things are provable we can deduce that something else is provable. These stack elements don't have any names, they are just steps in the proof and they get constructed in a tree structure based on the order of theorem applications.

With compressed proofs, we add another data structure, the "heap", which also consists of statements but has random access unlike the stack. The Z command pushes the thing on top of the stack to the heap (labeled with the next available number). When you reference a heap element, that statement is pushed onto the stack. So you can do a big subproof to generate some statement, save it to the heap (which does not pop it from the stack), and then recall it later to use that same statement 2 or more times in the same proof without having to go through all the steps of the proof again.

Mario

Benoit

unread,
May 2, 2022, 6:26:00 PMMay 2
to Metamath
I was writing a Metamath verifier some time ago (in OCaml), and I found looking at mmverify.py in Python by Raph Levien very useful, see https://github.com/david-a-wheeler/mmverify.py/blob/master/mmverify.py (the "proof decompression" was added by Scott Fenton).  Actually, I made a PR that avoids decompression and directly verifies the compressed proofs (making a four-fold speedup), that adds some comments (some especially relevant to your questions) and typing annotations for clarity, and that separates the different steps (see the three functions "treat_step", "treat_normal_proof", and "treat_compressed_proof"), see https://github.com/benjub/mmverify.py/blob/faster/mmverify.py.

Benoît

Philip White

unread,
May 3, 2022, 1:23:19 AMMay 3
to Metamath
On Monday, May 2, 2022 at 6:26:00 PM UTC-4 Benoit wrote:
I was writing a Metamath verifier some time ago (in OCaml), and I found looking at mmverify.py in Python by Raph Levien very useful, see https://github.com/david-a-wheeler/mmverify.py/blob/master/mmverify.py (the "proof decompression" was added by Scott Fenton).  Actually, I made a PR that avoids decompression and directly verifies the compressed proofs (making a four-fold speedup), that adds some comments (some especially relevant to your questions) and typing annotations for clarity, and that separates the different steps (see the three functions "treat_step", "treat_normal_proof", and "treat_compressed_proof"), see https://github.com/benjub/mmverify.py/blob/faster/mmverify.py.

Ah, very nice. thanks for the reference. I was reading through the original mmverify to see how compressed proofs were handled. I think decompressing prior to verification is harder to follow because then you have to figure out which proofs are part of a particular subproof. I guess doing verification at the same time as decompression also has the nice property of being faster.

Unrelated: my verifier is also in OCaml. I don't expect it to set any speed records; my ambition is more in the direction of making a web-based proof visualization tool, and perhaps also a proof assistant UI.
 
With compressed proofs, we add another data structure, the "heap", which also consists of statements but has random access unlike the stack. The Z command pushes the thing on top of the stack to the heap (labeled with the next available number). When you reference a heap element, that statement is pushed onto the stack. So you can do a big subproof to generate some statement, save it to the heap (which does not pop it from the stack), and then recall it later to use that same statement 2 or more times in the same proof without having to go through all the steps of the proof again.

Your whole explanation, and especially the heap analogy, is extremely helpful; thank you so much!

- Philip

Benoit

unread,
May 3, 2022, 8:29:55 AMMay 3
to Metamath
On Tuesday, May 3, 2022 at 7:23:19 AM UTC+2 Philip White wrote:
Unrelated: my verifier is also in OCaml. I don't expect it to set any speed records; my ambition is more in the direction of making a web-based proof visualization tool, and perhaps also a proof assistant UI.

Nice !  I think a web-based program would be a nice addition, and I was thinking of using js_of_ocaml for that.  I suggested OCaml for the community verifier in https://groups.google.com/g/metamath/c/6fQ_GxnPAWI/m/54sFV7AOAAAJ but unfortunately it didn't catch on.  For the moment, I have a "minimum viable product" which builds an AST from a .mm file, displays statements and verifies proofs, with a minimal CLI.  I'd like to round this up, and after that I'd be happy to collaborate if you want (there are many different blocks we can work on).

Benoît

Philip White

unread,
May 3, 2022, 6:48:41 PMMay 3
to Benoit, meta...@googlegroups.com
> Nice ! I think a web-based program would be a nice addition, and I
> was thinking of using js_of_ocaml for that. I suggested OCaml for
> the community verifier in
> https://groups.google.com/g/metamath/c/6fQ_GxnPAWI/m/54sFV7AOAAAJ but
> unfortunately it didn't catch on. For the moment, I have a "minimum
> viable product" which builds an AST from a .mm file, displays
> statements and verifies proofs, with a minimal CLI. I'd like to
> round this up, and after that I'd be happy to collaborate if you want
> (there are many different blocks we can work on).

Yeah, js_of_ocaml is also what I was planning to use. More
specifically, I was going to use https://github.com/janestreet/bonsai
to make the UI bits. It's a strange and rather difficult to comprehend
library [1], but I've found that it makes building composable components
rather easy. I'd be interested in collaborating, and I would be happy
to explain how Bonsai works if we end up using it for a UI.

One aspect of this UI that I would like to make polished is a
visualization/debugger of how the proof progresses. My current vision
is that the user gets to see the formulas on the stack and the formulas
in the "heap" (to borrow Mario's analogy for how Z works). You can step
forward through the proof, which will animate through all the
unification and substitution steps involved in using an assertion.
In my mind, good animations would be a big win over the static html
pages that comprise the current proof explorer. Also, being able to
step forward and backward through proof steps seems to match much
better with how verification actually works.

[1] Bonsai is difficult and strange in sort of the same way that Monads
are difficult and strange. Now that I am familiar and well
practiced in using Bonsai, I find it really easy and natural to do
pretty much anything I want to.

Reply all
Reply to author
Forward
0 new messages