A slightly larger than usual proof

137 views
Skip to first unread message

Jens-Wolfhard Schicke-Uffmann

unread,
Sep 13, 2016, 2:13:01 PM9/13/16
to metamath
Hi,

today I finished proving correctness of the first C function of a larger
project against an axiomatisation of a subset of binary ARM assembly,
run as a bare-metal image on a Rasberry Pi. This project is the reason
why I made igor.

You can find the abridged, relevant files at
http://drahflow.name/2016-09-13-large-proof.tbz2

The proof of lem-function-exchange_mail is potentially
the largest currently existing metamath proof and might be
an interesting test case for verifiers in its own right.

Obviously, if anyone has good ideas how to improve this kind of proof,
please go ahead :) My next few months will be spent pushing (even more)
automation into the proof script until it can be auto-generated directly
from the .c file.


Regards,
Drahflow
signature.asc

fl

unread,
Sep 14, 2016, 6:00:58 AM9/14/16
to Metamath
I'm not sure that I understand what you are doing. I suppose you prove every assert in kernel7 and then connect them
to prove your postconditions. As it is shown by your work, proofs of algorithms are demanding, to put it middly. In that case, I advise
you to send most of your partial proofs to automatic proof solvers (those proofs are simple) and to use metamath only for the
most difficult ones. The results of your partial proofs can be considered as axioms for your larger ones, or equivalently
you introduce them as premisses.

Using set.mm for proving algorithms is not a bad idea. It is based on set theory that is more widely known than
lambda calculus. But set.mm is also poor at deaming with numbers and if you are managing an assembly program, as it is your case,
it may be not convenient.

--
FL

Jens-Wolfhard Schicke-Uffmann

unread,
Sep 14, 2016, 6:41:36 AM9/14/16
to 'fl' via Metamath
On Wed, Sep 14, 2016 at 03:00:57AM -0700, 'fl' via Metamath wrote:
> I'm not sure that I understand what you are doing.
I start from an axiomatisation of ARM CPU behavior (arm.mm),
a binary image as loaded by the Raspberry firmware into RAM
(kernel7.img.mm; assembled from kernel7.asm; compiled from *.c)
and via some axiliary lemmata and definitions (kernel7.invariant.mm +
system-invariant.mm) show that my functions behave as specified
(kernel7.mm; generated from *.igor).

> I suppose you prove every assert in kernel7 and then connect them to
> prove your postconditions.
That is the plan anyway. So far I only did lem-function-exchange_mail.

Ultimately there will be some main function on which a different theorem
must be shown, namely that it loops forever, but reaching the same point
again and again after finite instruction count while keeping certain
invariants.


> As it is shown by your work, proofs of algorithms
> are demanding, to put it middly. In that case, I advise
> you to send most of your partial proofs to automatic proof solvers (those
> proofs are simple) and to use metamath only for the
> most difficult ones.
The question really is how (or whether) to save the auto-generated
proofs. Currently Igor writes them and they transparently become part of
the large proof.

> The results of your partial proofs can be considered as
> axioms for your larger ones, or equivalently
> you introduce them as premisses.
Do I understand you correctly? You mean I prove by some external means
theorems, like |- ( ; 1 6 x. ; 1 6 ) = ; ; 2 5 6 and import them as
axioms? But this would mean trusting other tools beside metamath with
behaving correctly, no?

> Using set.mm for proving algorithms is not a bad idea. It is based on set
> theory that is more widely known than
> lambda calculus. But set.mm is also poor at deaming with numbers and if you are
> managing an assembly program, as it is your case,
> it may be not convenient.
Concrete natural numbers are not a real problem with Igor, they are
dealt with mostly automatically (it still blows up the proof size,
obviously).


Regards,
Drahflow
signature.asc

Norman Megill

unread,
Sep 14, 2016, 7:29:30 AM9/14/16
to meta...@googlegroups.com
This is very interesting, at least as a stress test of Metamath tools.

For others who want to try this, system-invariant.mm seems to be the
master that reads in the various included files including set.mm. I
read it in metamath.exe, which verifies the proof fine (with warnings
for some unproved theorems); the big 2MB proof
lem-function-exchange_mail, located in kernel7.mm, takes 2 to 3 seconds
to verify by itself.

The big proof lem-function-exchange_mail chokes MM-PA, which hasn't
completed after several minutes. It works with an uncompressed proof -
do you know how many steps it would have? Does your Igor tool work
directly with compressed proofs?

'show proof lem-function-exchange' gives '*** FATAL ERROR: Out of
memory.' which I haven't seen since the days of pre-OSX on the Mac in
the 1990s. I haven't pinned down why this is happening yet. I actually
don't know how much memory can be allocated on 64-bits Windows 7 with
8GB memory - I had thought that it would be "unlimited" with virtual
memory, but I guess this proves it wrong.

(Later: it gives this error with the pre-compiled metamath.exe for
Windows that is provided in the release, which is compiled with lcc. A
gcc version works, and 'show proof lem-function-exchange_mail/reverse'
shows there are 509097 compressed steps. 'show proof
lem-function-exchange_mail/reverse/renumber' shows there are 104669
compressed steps without the syntax-building steps.)

I didn't try mmj2.

If this is an encoding of framebuffer.c (2KB), then the 2MB proof would
represent a De Bruijn factor of 10000.

Perhaps you can give an overview of what this is doing, and how you know
that your Metamath theorem statement accurately reflects what the C
function is supposed to do.

Norm

Norman Megill

unread,
Sep 14, 2016, 8:17:36 AM9/14/16
to meta...@googlegroups.com
On 9/14/16 6:41 AM, Jens-Wolfhard Schicke-Uffmann wrote:
> On Wed, Sep 14, 2016 at 03:00:57AM -0700, 'fl' via Metamath wrote:

>> The results of your partial proofs can be considered as
>> axioms for your larger ones, or equivalently
>> you introduce them as premisses.
> Do I understand you correctly? You mean I prove by some external means
> theorems, like |- ( ; 1 6 x. ; 1 6 ) = ; ; 2 5 6 and import them as
> axioms? But this would mean trusting other tools beside metamath with
> behaving correctly, no?

There is no reason to call unproved results axioms. Calling them
theorems, even with empty proofs, is much better since the verification
tools tell you what hasn't been proved. (I think this is what you are
already doing.) And there isn't any pre-defined size limit for the .mm
file, so there's no reason that such empty proofs can't be filled in later.

The main issue with metamath.exe (if you or others want to use that
tool) is that the proof is too large to handle conveniently. If things
like |- ( ; 1 6 x. ; 1 6 ) = ; ; 2 5 6 are done internally in the giant
proof, they could be broken out as lemmas. What is the main thing that
makes the proof so large, and could it be broken up into more
conveniently sized lemmas (even if there are a hundred of them)?

A quick test of whether a proof is "too large" for metamath.exe
convenience is "save proof xxx /compressed /time", which tells how much
time it takes to uncompress and recompress the proof with metamath.exe's
(not terribly efficient) compression algorithm. For set.mm
contributions, if it takes more than 5 or 10 seconds, I usually ask
contributors to break up the proof. This also helps make the web page
size manageable so we won't have 100MB web pages. Of course for a
stand-alone project you can do what you want.

Norm

Norman Megill

unread,
Sep 14, 2016, 8:24:49 AM9/14/16
to meta...@googlegroups.com
On 9/14/16 7:29 AM, Norman Megill wrote:
> If this is an encoding of framebuffer.c (2KB), then the 2MB proof would
> represent a De Bruijn factor of 10000.

I meant 1000. Although it might be more meaningful to calculate it from
the assembler code kernel7.asm (11KB), which would reduce it to 200.

Norm

Jens-Wolfhard Schicke-Uffmann

unread,
Sep 14, 2016, 8:32:53 AM9/14/16
to meta...@googlegroups.com
On Wed, Sep 14, 2016 at 07:29:19AM -0400, Norman Megill wrote:
> For others who want to try this, system-invariant.mm seems to be the
> master that reads in the various included files including set.mm.
Sorry for not explaining it better in my first mail, I was a bit
exhausted from a city council meeting.

> I read it in metamath.exe, which verifies the proof fine (with
> warnings for some unproved theorems);
... just for the record: Those unproved theorems are not used in
lem-function-exchange_mail but are open theorems for other
C functions.

> The big proof lem-function-exchange_mail chokes MM-PA, which hasn't
> completed after several minutes. It works with an uncompressed
> proof - do you know how many steps it would have? Does your Igor
> tool work directly with compressed proofs?
It builds a proof graph in memory and unifies common subgoals. So
it essentially works on compressed proofs, but not on the concrete
compressed representation.

> If this is an encoding of framebuffer.c (2KB), then the 2MB proof
> would represent a De Bruijn factor of 10000.
Yes, it is the proof of the framebuffer.c function exchange_mail.
Unfortunately, it is not yet an encoding of it, because it also
needs lem-function-exchange_mail.igor to come into existence.

> Perhaps you can give an overview of what this is doing, and how you
> know that your Metamath theorem statement accurately reflects what
> the C function is supposed to do.
The function in question communicates with the Raspberry Pi video
hardware and exchanges one message (called "mail" in the Pi
documentation).

As I whish to avoid axiomatizing all the specific hardware of the Pi,
the theorem does not talk about the concrete purpose of the function,
only about facts between the system state before (`x`), while (`y`) and
after (`z`) the function has been run:
* z e. kernel-code-constant, i.e. all the bytes which comprise
the compiled code are as they were initially
* ( z reg ; 1 3 ) = ( x reg ; 1 3 ), i.e. the stack pointer is
as before the function ran
* ( z reg ; 1 5 ) = ( ( x reg ; 1 5 ) + 4 ), i.e. the instruction
pointer is 4 bytes after the position it was when the CPU executed
the blx instruction which moved control into the function; which is
to say the function returned to where it was supposed to
* on all paths, such a `z` can be reached in less than
`; ; ; ; ; 5 0 0 0 0 0` `step`s, i.e. ARM instruction executions
(from the //! runtime comment in framebuffer.c)
* given any `ph` (with free `x`), and any `ps` (with free `y`) such that
`A. y ( A. a e. cpu-mem ( a >_ ( x reg ; 1 3 ) -> ( y mem a ) = ( x mem a ) ) -> ( ph -> ps ) )`
i.e. given that the memory above the initial stack pointer is still
unchanged relative to `x` then `ps` follows from `ph`
then
`ps` in all intermediate states (`y`) during the path from `x` to `z`
(including `x` and `z`)

This is essentially describing all the things the function will not
do (from //! independence in framebuffer.c) and allows me to call
the function from different context and carry through this call any
arbitrary invariants needed in said contexts (as long as the invariant
condition is enough to keep the invariant true, of course)

I don't "know" that this theorem is a correct and sufficient description
of what the C function does. But ultimately I'd like to show besides
other things:

`|- A. p e. ( path ` step ) ( ( p ` 0 ) = 0 -> A. i e. NN0 ( p ` i ) e. kernel-code-constant )`
i.e.
In all execution paths starting from when the image is loaded, the
bytes comprising the code will never change.

I know that the lem-function-exchange_mail lemma is a right one when
it becomes part of that global proof.


Regards,
Drahflow
signature.asc

Jens-Wolfhard Schicke-Uffmann

unread,
Sep 15, 2016, 6:55:30 AM9/15/16
to meta...@googlegroups.com
Hi,

On Wed, Sep 14, 2016 at 08:17:25AM -0400, Norman Megill wrote:
> The main issue with metamath.exe (if you or others want to use that
> tool) is that the proof is too large to handle conveniently. If
> things like |- ( ; 1 6 x. ; 1 6 ) = ; ; 2 5 6 are done internally in
> the giant proof, they could be broken out as lemmas. What is the
> main thing that makes the proof so large, and could it be broken up
> into more conveniently sized lemmas (even if there are a hundred of
> them)?
The most commonly used lemmata are:
1042 decltc
* 1065 jca4
- 1118 eqeq1d
1138 id
1139 mpcom
* 1298 adantl3
* 1309 animpass
* 1363 adantl2
1378 mpbird
1424 ax-mp
1595 mpbir
* 1682 adantl16
- 1689 imbi2d
- 1810 anim2i
- 2025 anbi1d
2297 hban
2396 syl
- 2495 anbi2d
- 2520 anbi12d
* 2564 adantl4
* 2580 adantl8
* 3190 adantl
- 3621 anbi2d4
4187 a1i
* 4871 jca
* 5185 simpl


The *-marked are typically used for antecedent selection / reordering / etc.
The --marked are typically used in term rewriting.

It appears many steps are currently spent managing antecedents, e.g.
the goal is

|- ( ph /\ ( ph' /\ .... ( ph^16 /\ ph^17 ) ) -> ps )

and some automation deduced ( ph^15 /\ ph^16 ) -> ps

Then a bazillion steps are spend getting there. This is already somewhat
ameliorated with adantl8 and similar lemmata, but I think there is a lot
of potential for optimization.

Regards,
Drahflow
signature.asc

fl

unread,
Sep 15, 2016, 7:44:58 AM9/15/16
to Metamath
The question really is how (or whether) to save the auto-generated
proofs.

I suppose the answer is "you don't". First because they don't even exist.
Those kinds of softwares, for many of them, use semantic method: they
know that a proof exists without even trying to look for it -- as it has been
pointed out by Norm recently.

Currently Igor writes them and they transparently become part of
the large proof.


What is the goal of Igor by the way. I had understood it was a programming language.
 

But this would mean trusting other tools beside metamath with
behaving correctly, no?


Yes. But I don't see what the problem.is  At least at a philosophical level.
Obviously if the softwares you use are buggy you can't trust them but most
of them are made by very reliable mathematicians and engineers.

-- 
FL

Jens-Wolfhard Schicke-Uffmann

unread,
Sep 15, 2016, 8:49:07 AM9/15/16
to 'fl' via Metamath
On Thu, Sep 15, 2016 at 04:44:58AM -0700, 'fl' via Metamath wrote:
> Currently Igor writes them and they transparently become part of
> the large proof.
> What is the goal of Igor by the way. I had understood it was a programming
> language.
The goal of Igor is to have a proof assistant for set.mm which is
powerful enough to proof correctness of non-trivial programs.

> But this would mean trusting other tools beside metamath with
> behaving correctly, no?
> Yes. But I don't see what the problem.is  At least at a philosophical level.
> Obviously if the softwares you use are buggy you can't trust them but most
> of them are made by very reliable mathematicians and engineers.
If a proof is split over multiple software packages, then a bug in any
of them can lead to an incorrect proof. To maximise the probability of a
proof being correct, it would be good to rely on as few software as
possible for verification (and if possible check the proof using
multiple verifiers).


Regards,
Drahflow
signature.asc

David A. Wheeler

unread,
Sep 15, 2016, 7:01:14 PM9/15/16
to meta...@googlegroups.com, Jens-Wolfhard Schicke-Uffmann

>today I finished proving correctness of the first C function of a
>larger
>project against an axiomatisation of a subset of binary ARM assembly,
>run as a bare-metal image on a Rasberry Pi. This project is the reason
>why I made igor.


This is amazing stuff. I have been thinking about metamath for math, not for programs. I'm very interested to see where you take this.


--- David A.Wheeler

Jens-Wolfhard Schicke-Uffmann

unread,
Sep 9, 2019, 3:41:43 AM9/9/19
to metamath
Hi,

On Tue, Sep 13, 2016 at 08:12:58PM +0200, Jens-Wolfhard Schicke-Uffmann wrote:
> today I finished proving correctness of the first C function of a larger
> project against an axiomatisation of a subset of binary ARM assembly,
> run as a bare-metal image on a Rasberry Pi. This project is the reason
> why I made igor.
>
> You can find the abridged, relevant files at
> http://drahflow.name/2016-09-13-large-proof.tbz2
>
> The proof of lem-function-exchange_mail is potentially
> the largest currently existing metamath proof and might be
> an interesting test case for verifiers in its own right.
It _was_ the largest (showing correctness of a 102 instruction function).
After resuming said project a few months back, I found that the
proof assistant infrastructure would not scale up very well, and instead
rewrote most of it to tackle a function of 287 instructions.

http://drahflow.name/2019-09-09-lem-function-init_local_framebuffer.tbz2
% metamath system-invariant.mm
MM> verify proof lem-function-init_local_framebuffer
... 17 seconds and 1.3 GB RAM later ...
lem-function-init_local_framebuffer
MM>

Files:
* arm.mm - Axiomatization of a (very small) subset of ARM instructions
* kernel7.img.mm - Metamath-yfied firmware image
* kernel7.invariant.mm - Assembler-generated (very simple) invariant definition
* kernel7.mm - Lemmata for the C functions
* lem-function-init_local_framebuffer.igor
- Proof script input to igor
* system-invariant.ey - Project specific automations for igor
* system-invariant.mm - Root metamath file, includes the others
* set.mm - Pretty old, but compatible set.mm

This is all work in progress, so don't expect particularly good presentation.
Also, I am sure the proof could be optimized in many ways.
According to igor's statistics, the proof has 1775932 steps (after
proof compression), of which ~1100000 should be non-syntax steps of some
sort. It took around 60 minutes (and ~10 GB RAM) to build this proof
from lem-function-init_local_framebuffer.igor

Most relevant (to me) is the fact that this proof includes a function
call to exchange_mail, i.e. the other function already shown. The
invariant from the calling function is successfully carried through all
steps of the called function, without having to explicitly show this
for each instruction.

I invite you to also check lem-function-init_local_framebuffer.igor, to
get an idea of what automations are already available, some nice things
(if you're used to step-by-step style):

* Everything is deduction style, shuffling of the antecedents is fully
automated.
* Rewrite support
> rewrite ( ; ; ; ; ; ; ; 1 6 7 7 7 2 1 7 mod ; ; ; ; 6 5 5 3 6 ) = 1
The left side is replaced in all antecedents and in the goal
conclusion with the right side.
(And the arithmetics for this case are fully automated, too.)
* Instantiating universal quantifiers for concrete instances
> instantiate A. a e. cpu-mem ( ( a >_ ( x reg ; 1 3 ) /\ -. a e. ( { n e. NN0 | ( n >_ ; ; ; ; 3 7 4 3 6 /\ n < ; ; ; ; 3 7 4 4 0 ) } u. ( { n e. NN0 | ( n >_ ; ; ; ; 3 7 4 4 0 /\ n < ; ; ; ; 3 7 4 4 4 ) } u. { n e. NN0 | ( n >_ ; ; ; ; 3 7 4 4 4 /\ n < ; ; ; ; 3 7 4 4 8 ) } ) ) ) -> ( u mem a ) = ( x mem a ) ) for a
(Including the required renaming, where needed.)


Regards,
Drahflow
signature.asc
Reply all
Reply to author
Forward
0 new messages