Metamath -> FOL translation

82 views

Skip to first unread message

Mario Carneiro

unread,

Aug 4, 2016, 3:51:37 AM8/4/16

to metamath

Hi All,

I'm back from CICM 2016 with some very good conversations and a possible collaboration with the KWARC group (https://kwarc.info). When I left, Dennis Müller was half way through a Metamath importer into MMT, and I'd like to get Metamath exports working soon. I gave two talks, which I will hopefully be able to get on Youtube soon. In the meantime, you can enjoy the slides at:

http://us2.metamath.org:88/ocat/pnt/pnt.pptx
http://us2.metamath.org:88/ocat/pnt/pnt.pptx.pdf
http://us2.metamath.org:88/ocat/model/model.pptx
http://us2.metamath.org:88/ocat/model/model.pptx.pdf

In one of my talks, I discuss an embedding of Metamath in FOL, using the model as the FOL universe. It was (justifiably) pointed out that this is a "deep" embedding, in which the forall maps to a binary function, rather unlike the standard notion from FOL which is a binder. I'm working on a new paper that explains an alternative "shallow" embedding, and I'll try to explain the basics here.

Before I get into the details, let me say that the encoding is already implemented in mmj2 and works on all of set.mm, just run the macro RunMacro,toFol,* to see FOL translations of all Assrts in the database. I will give examples as well later on.

The basic idea is already present in the "Models for Metamath" paper (http://us2.metamath.org:88/ocat/model/model.pdf page 14-15), when I translate the Metamath ZFC axioms to the FOL ZFC equivalents. Within the standard model of set.mm, the forall uses the expression f[x->m], which changes one of the free variables from the valuation to a quantified variable. In a sense, this is the only way an expression can "introspect" on the other possible values of a wff or class variable. This means we can replace the function phi(f) which ranges over all f: V -> M with a more limited expression phi(x) where x: M. In general, phi may depend on more variables, but the common link is that phi can only depend on variables which surround it in binders like A. x ph or { x | ph }.

This leads to the following procedure: Given a theorem, a wff/class variable ph is determined to depend on x if it appears inside a binding expression that binds x. (For now, just take it as given that A. x ph and { x | ph }.are binding expressions, and others defined in terms of these are also binding expressions.) This analysis is performed across the whole theorem, both hypotheses and conclusion, to determine the arity of ph. For example, exlimiv:

$d x ps & |- ( ph -> ps ) => |- ( E. x ph -> ps )

Here, ph depends on x because it shows up inside the binder E. x ph. Thus after marking dependencies we get:

$d x ps & |- ( ph(x) -> ps ) => |- ( E. x ph(x) -> ps )

If a disjoint variable condition is given, this counteracts any dependency as determined from this method. (In this case, ps is already determined to not depend on x because it doesn't appear around any binders, so the $d makes no difference. What this means in practice is that any occurrences of x in ps are considered free variables, since nothing is available to bind them.) Thus:

|- ( ph(x) -> ps ) => |- ( E. x ph(x) -> ps )

For the next step, we find any free variables in each step and close them at the meta-level. Obviously a free x appearing in a step is a free variable, but also any variables that appear in the dependencies of a wff/class variable, such as the ph(x) in the hypothesis here. I use dependent type theory notation for this: the hypothesis should be proven for an arbitrary x, so the hypothesis is translated to a Pi binder over all x in the model. This is done independently for each hypothesis. Thus:

( {x: set} |- ( ph(x) -> ps ) ) => |- ( E. x ph(x) -> ps )

(The ({x} body) notation is the Pi binder, and ([x] body) is the lambda binder.) Finally, we close the whole theorem over all wff/class metavariables, with the appropriate types. In this case, ph has type set -> prop, and ps has type prop. Thus:

{ph: set -> prop} {ps: prop} ( {x: set} |- ( ph(x) -> ps ) ) => |- ( E. x ph(x) -> ps )

This is our final result. Actually, the notation is a bit different as output by mmj2:

exlimiv: {ph ps} ({x} |- wi (ph x) ps) -> |- wi (wex [x] ph x) ps

The types are elided, and "wi" is a constant of type prop -> prop -> prop which is axiomatically defined. In this manner all syntax axioms are translated. A syntax axiom which contains only class or wff variables is translated into a curried function in the obvious way, where "wff" becomes prop and "class" becomes (set -> prop). But if there is a set variable in the syntax axiom this becomes more complicated, because the variable could potentially bind any of the other variables. For each wff/class variable A and each set variable x in the definition, we determine if occurrences of x in A are bound, free, or both. For example, in

A. x e. A ph <-> A. x ( x e. A -> ph )

occurrences of x in ph or A are bound (although I think it would be nicer if occurrences of x in A are free), and in

X_ x e. A B = { f | ( f Fn A /\ A. x e. A ( f ` x ) e. B ) }

occurrences of x in B are bound, and occurrences of x in A are both bound (from the forall) and free (in the first conjunct).

For each variable that appears free in A, A is given a dependency, just like we did for theorem statements. Thus A. x ( x e. A -> ph ) becomes A. x ( x e. A(x) -> ph(x) ). These parameters are then passed to the syntax axiom as higher-order functions, so that in this case wral has type (set -> class) -> (set -> prop) -> prop. If there are any free variables remaining, such as in

{ f | ( f Fn A(x) /\ A. x e. A(x) ( f ` x ) e. B(x) ) }

(note that the first x is free), we must pass x itself as an extra parameter, so cixp would get type set -> (set -> class) -> (set -> class) -> class. (Clearly it is an unfortunate side effect of our df-ixp that the type turns out this way, but we are going for a 100% translation here, with no special-casing. If we allowed $d x A in the definition we could have the simpler type "class -> (set -> class) -> class".) This function would get used as "X_ x e. A B" => "cixp x ([x] A) ([x] B)" where A and B are expressions potentially containing x.

Here are a few more selected translations:

ax-1: {ph ps} |- wi ph (wi ps ph)
ax-2: {ph ps ch} |- wi (wi ph (wi ps ch)) (wi (wi ph ps) (wi ph ch))
ax-3: {ph ps} |- wi (wi (wn ph) (wn ps)) (wi ps ph)
ax-mp: {ph ps} |- ph -> |- wi ph ps -> |- ps
ax-5: {ph ps} |- wi (wal [x] wi (ph x) (ps x)) (wi (wal [x] ph x) (wal [x] ps x))
ax-6: {ph} |- wi (wn (wal [x] ph x)) (wal [x] wn (wal [x] ph x))
ax-7: {ph} |- wi (wal [x] wal ([y] ph x y)) (wal [y] wal ([x] ph x y))
ax-gen: {ph} ({x} |- ph x) -> |- wal ([x] ph x)
ax-8: |- wi (wceq (cv x) (cv y)) (wi (wceq (cv x) (cv z)) (wceq (cv y) (cv z)))
ax-11: {ph x y} |- wi (wceq (cv x) (cv y)) (wi (wal [y] ph x y) (wal [x] wi (wceq (cv x) (cv y)) (ph x y)))
ax-13: |- wi (wceq (cv x) (cv y)) (wi (wcel (cv x) (cv z)) (wcel (cv y) (cv z)))
ax-14: |- wi (wceq (cv x) (cv y)) (wi (wcel (cv z) (cv x)) (wcel (cv z) (cv y)))
ax-17: {ph} |- wi ph (wal [x] ph)
ax-12: |- wi (wn (wceq (cv x) (cv y))) (wi (wceq (cv y) (cv z)) (wal [x] wceq (cv y) (cv z)))
ax-10: |- wi (wal [x] wceq (cv x) (cv y)) (wal [y] wceq (cv y) (cv x))
ax-9: |- wn (wal [x] wn (wceq (cv x) (cv y)))
ax-4: {ph x} |- wi (wal [x] ph x) (ph x)
ax-16: {ph x} |- wi (wal [x] wceq (cv x) (cv y)) (wi (ph x) (wal [x] ph x))

hbal: {ph} ({x y} |- wi (ph x y) (wal [x] ph x y)) -> ({x} |- wi (wal [y] ph x y) (wal [x] wal ([y] ph x y)))
hba1: {ph} |- wi (wal [x] ph x) (wal [x] wal ([x] ph x))

ax-sep: {ph} |- wex ([y] wal ([x] wb (wcel (cv x) (cv y)) (wa (wcel (cv x) (cv z)) (ph x))))
ax-rep: {ph} |- wi (wal [w] wex ([y] wal ([z] wi (wal [y] ph y z w) (wceq (cv z) (cv y))))) (wex [y] wal ([z] wb (wcel (cv z) (cv y)) (wex [w] wa (wcel (cv w) (cv x)) (wal [y] ph y z w))))

Still to be done: outputting definitions, and outputting proof terms.

One thing which is not addressed here is DV conditions between sets. This translation effectively treats all set variables as distinct. More precisely, the translation can optionally be preceded by a renaming step that merges any two set variables that lack a disjointness condition. Since this can significantly change the FOL meaning of the expression, we treat different renamings as completely different theorems, and only translate the all-distinct version by default.

Getting Metamath and HOL/FOL communities to see eye to eye on this is an interesting challenge. I've had some success already in explaining the semantics of Metamath statements to type theorists by means of this translation, and I'm curious to see what the Metamathers find hardest to understand in the above exposition.