How to tell if proof steps are uninteresting.

83 views
Skip to first unread message

Philip White

unread,
Jun 3, 2022, 3:47:16 PM6/3/22
to meta...@googlegroups.com
This line in the book confuses me:

> By default Metamath does not show $f hypotheses and everything
> branching off of them in the proof tree when the proof is displayed;
> this makes the proof look more like an ordinary mathematical proof,
> which does not normally incorperate the explicit construction of
> expressions.

I would expect $f hypotheses to be the leaves of the tree, since it
never takes any extra steps to prove a $f hypothesis. The output of
"show proof some_assertion /all" seems to agree, if I'm interpreting
the indentation correctly as an upside-down tree.

The context is that I'm working on making the debugger in my UI take
large steps so that you don't have to wade through all the superfluous
steps of a proof. Is understanding this one sentence sufficient to
achieving the same behavior as metamath.exe's "essential" mode view?

Thanks,
Philip

Benoit

unread,
Jun 3, 2022, 5:45:39 PM6/3/22
to Metamath
The divide between ALL and ESSENTIAL is that the latter only shows the steps with typecode '|-' (and the final step, whatever its typecode is).  The other steps, those constructing the formulas and classes, are "syntactic", that is, their typecodes are wff, set, class (and indeed, their leaves are $f-statements).

To see an extreme example, which is kind of abusing the system, see http://us2.metamath.org/mpeuni/bj-0.html and http://us2.metamath.org/mpeuni/bj-1.html (see also the head comment of that section, and compare the results of >show proof xxx/ESSENTIAL and /ALL on these).

A clear exposition of the notion of syntactic versus essential (which extends beyond set.mm) is presented in Mario's article "Models for Metamath" (the part about weakly grammatical databases, so you do not have to read through the (harder) part about models).

Benoît

Philip White

unread,
Jun 3, 2022, 7:47:39 PM6/3/22
to meta...@googlegroups.com
> The divide between ALL and ESSENTIAL is that the latter only shows
> the steps with typecode '|-' (and the final step, whatever its
> typecode is). The other steps, those constructing the formulas and
> classes, are "syntactic", that is, their typecodes are wff, set,
> class (and indeed, their leaves are $f-statements).

Ohhh, yeah, that makes complete sense. I was expecting that metamath
somehow had an internal concept of what is merely syntactic and what is
essential, but I guess it's just by convention. In other words, if I
were to make a file with different typecodes, then ESSENTIAL mode would
not work so well? (that's easy to verify I guess)

Probably what I should do is give the user control over the typecodes
they want to consider essential.

> To see an extreme example, which is kind of abusing the system, see
> http://us2.metamath.org/mpeuni/bj-0.html and
> http://us2.metamath.org/mpeuni/bj-1.html (see also the head comment
> of that section, and compare the results of >show proof xxx/ESSENTIAL
> and /ALL on these).

Woah, that's real weird.

> A clear exposition of the notion of syntactic versus essential (which
> extends beyond set.mm) is presented in Mario's article "Models for
> Metamath" (the part about weakly grammatical databases, so you do not
> have to read through the (harder) part about models).

Oh, cool; I'll have to read that. Thanks for the tip.

- Philip

Mario Carneiro

unread,
Jun 3, 2022, 9:23:25 PM6/3/22
to metamath
On Fri, Jun 3, 2022 at 7:47 PM 'Philip White' via Metamath <meta...@googlegroups.com> wrote:
> The divide between ALL and ESSENTIAL is that the latter only shows
> the steps with typecode '|-' (and the final step, whatever its
> typecode is). The other steps, those constructing the formulas and
> classes, are "syntactic", that is, their typecodes are wff, set,
> class (and indeed, their leaves are $f-statements).

Ohhh, yeah, that makes complete sense. I was expecting that metamath
somehow had an internal concept of what is merely syntactic and what is
essential, but I guess it's just by convention. In other words, if I
were to make a file with different typecodes, then ESSENTIAL mode would
not work so well? (that's easy to verify I guess)

Probably what I should do is give the user control over the typecodes
they want to consider essential.

One of the things that came out of the research which resulted in the "Models" paper was the "syntax" $j command. In set.mm you can find the following declarations:

$( $j
  syntax 'wff';
  syntax '|-' as 'wff';
  syntax 'setvar';
  syntax 'class';
$)


These correspond to the "Syn" function from the paper. In short, there is one of these syntax lines for every legal typecode (that can appear as the initial constant on any $e $f $a $p statement), and the ones that have an 'as' are the logical typecodes (and the thing after the 'as' is the typecode by which to parse the contents) while those that don't have 'as' are grammatical typecodes. So if you wanted to generalize the "is it '|-'?" check, you would instead check if it is a logical typecode. This does require $j parsing however, which is optional for metamath verifiers (since this was intended to be a backward-compatible addition).

Benoit

unread,
Jun 4, 2022, 9:36:27 AM6/4/22
to Metamath
Mario: I had been wondering about something for a while: in Models for Metamath, you define "weakly grammatical" as a property and "grammatical" as an extra structure (the function Syn is part of the data).  One could define a system \Gamma to be *uniquely grammatical* if there exists a unique Syn: TC \to VT such that (\Gamma, Syn) is grammatical.  It is not hard to see that set.mm-like systems are uniquely grammatical.  In general, I haven't found non-trivial necessary or sufficient conditions to be uniquely grammatical.  Have you thought about these questions ?

It looks like if you know beforehand that a system is uniquely grammatical, these $j commands are not necessary: upon reading the first assertion with a non-VT typecode, say $a T expr $.,  the system tries to prove $p S expr $= ? $. for all variable types S, and it will succeed for a unique one, which is then Syn(T).  (This assumes that a typecode which first appears in an assertion does not appear later in an $f-statement.)

Benoît

Mario Carneiro

unread,
Jun 4, 2022, 10:19:13 PM6/4/22
to metamath
On Sat, Jun 4, 2022 at 9:36 AM Benoit <benoit...@gmail.com> wrote:
Mario: I had been wondering about something for a while: in Models for Metamath, you define "weakly grammatical" as a property and "grammatical" as an extra structure (the function Syn is part of the data).  One could define a system \Gamma to be *uniquely grammatical* if there exists a unique Syn: TC \to VT such that (\Gamma, Syn) is grammatical.  It is not hard to see that set.mm-like systems are uniquely grammatical.  In general, I haven't found non-trivial necessary or sufficient conditions to be uniquely grammatical.  Have you thought about these questions ?

Well, it's easy enough to determine which typecodes are syntax typecodes since they are the typecodes of variables, but I don't think it is very simple to figure out the mapping '|-' -> 'wff' for the logical typecodes. Especially if there are no axioms (yet) the problem is underdetermined.
 
It looks like if you know beforehand that a system is uniquely grammatical, these $j commands are not necessary: upon reading the first assertion with a non-VT typecode, say $a T expr $.,  the system tries to prove $p S expr $= ? $. for all variable types S, and it will succeed for a unique one, which is then Syn(T).  (This assumes that a typecode which first appears in an assertion does not appear later in an $f-statement.)

More to the point, it is my opinion that these kinds of things "feel" like they should be declarations. This prevents syntax errors like using ')' as a typecode, because you know what the legal typecodes are, and it allows you to parse expressions without guessing the typecodes.

In fact, the example where we truncate a grammatical database before the first axiom is actually true in general: It could be that ')' is a typecode of set.mm, and we simply have not introduced the first axiom using it. That is, the definition of "grammatical database" includes not just the data "Syn" but also "TC". (In the paper TC is inferred from the existence of axioms, but in the $j syntax approach TC is declared in advance of use, and I think this is the right approach as mentioned in the previous paragraph. In MM0 these are declared using the "sort" command.)

Even if we take the paper definition of TC, there is another example of a grammatical database that is not uniquely grammatical:

$c A B |- * $. $v a b $.
va $f A a $.
vb $f B b $.
as $a A * $.
bs $a B * $.
ax $a |- * $.

Here we have two variable typecodes A and B, and one logical typecode |-, and the question is: is Syn(|-) = A or Syn(|-) = B? It parses either way. This is actually an unambiguous formal system, because we require only that there are no two ways to derive the proof of the *same* syntax theorem, and "A *" and "B *" are different theorems.

Mario

Benoit

unread,
Jun 5, 2022, 7:16:08 AM6/5/22
to Metamath
Yes, it's that kind of example, where you basically duplicate a VT, that discouraged me from looking further at necessary and sufficient conditions for being uniquely grammatical.  It seems that such a condition would be basically a reformulation, saying that there are no two indistinguishable VTs (plus the equivalent of your variable-free "ax" to tie them up).  So as you seem to confirm, not much interest in looking further.

I guess there is a divergence between the abstract study of systems and the language design considerations (e.g., MM, MM0), for at least two reasons:
* some declarations, even though redundant, may serve as "error detecting codes", like $v and $c statements, or some $j statements;
* when studying the systems, it is generally convenient to ignore the order of statements, whereas, if you want to minimize the number of required passes, you should take it into consideration in language design;
plus probably performance considerations since e.g. determining unique gramaticalness and the correct Syn may take time.

Benoît

Mario Carneiro

unread,
Jun 5, 2022, 7:45:40 AM6/5/22
to metamath
The VTs don't have to be indistinguishable. They can actually be completely different, as long as they happen to have at least one common parsable string. Even set.mm has this situation: "class x" and "setvar x" are both grammatical. So for example if we added a new typecode like so:

$c INHABITED $.
ax-inh.1 $e |- x e. y $.
ax-inh.2 $a INHABITED y $.

then it would be ambiguous whether INHABITED is supposed to take a setvar or a class.

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metamath/4932d6bf-c373-42df-9301-1640a060768dn%40googlegroups.com.

Benoit

unread,
Jun 5, 2022, 8:19:25 AM6/5/22
to Metamath
I hadn't thought of this, that's interesting.  This is possible because of
  cv $a class x $.
which permits a kind of overloading of any predicate on classes.
This permits that setvar and class share "common parsable strings", as you write.  Forbidding that may be "aesthetically desirable" but it would probably be too cumbersome for practical developments.

Mario Carneiro

unread,
Jun 5, 2022, 8:35:01 AM6/5/22
to metamath
On Sun, Jun 5, 2022 at 8:19 AM Benoit <benoit...@gmail.com> wrote:
Forbidding that may be "aesthetically desirable" but it would probably be too cumbersome for practical developments.

I'm not sure I would call it desirable at all. It is very common for a string to simultaneously parse at multiple nonterminals of a CFG: you can't really do precedences without it. The whole idea is that the nonterminal to parse at is part of the input; requiring that every string parse to at most one nonterminal means you can't factor the grammar or do many other language-preserving transformations to the grammar because the identity of nonterminals is important. In MM0 the "coercion" notation command is explicitly intended to declare A -> B no-syntax functions analogous to "cv".

 

Benoit

unread,
Jun 8, 2022, 4:39:16 PM6/8/22
to Metamath
Coming back to your example from set.mm where you introduce
  ${ $e |- x e. y $.  $a INHAB y $. $}
one could argue that this is bad practice and that one should have introduced instead
  ${ $e |- x e. A $.  $a INHAB A $. $}
which is the "correct" level of generality.  With that new axiom, the system remains uniquely grammatical.  What I'm trying to do is find a criterion that would make more precise the fact that "reasonable",  or "currently encountered" databases are uniquely grammatical.  Or one could require that the types under coercions form a set of rooted trees, so that if there are no "partially duplicate" types (as in the example above with $a A * $. etc), then, even if the system is grammatical but not uniquely grammatical, there is a "most stringent" Syn function making it grammatical (in the example above with INHAB, this means taking Syn(INHAB)=setvar).

I have not practiced CFGs much, so I'll have to think a bit more about your latest remark.

Finally: I'll be attending on Zoom next Monday (what a committee!).

Benoît

Mario Carneiro

unread,
Jun 8, 2022, 7:48:03 PM6/8/22
to metamath
On Wed, Jun 8, 2022 at 4:39 PM Benoit <benoit...@gmail.com> wrote:
Coming back to your example from set.mm where you introduce
  ${ $e |- x e. y $.  $a INHAB y $. $}
one could argue that this is bad practice and that one should have introduced instead
  ${ $e |- x e. A $.  $a INHAB A $. $}
which is the "correct" level of generality.  With that new axiom, the system remains uniquely grammatical. 

Sure, but it's also semantically different. Maybe it was actually supposed to be a predicate that only applies to set variables. Using set.mm for this example is somewhat limiting since there just aren't that many syntactic categories (and introducing new ones is discouraged anyway) - obviously this "INHAB" typecode would be rejected in review.

What I'm trying to do is find a criterion that would make more precise the fact that "reasonable",  or "currently encountered" databases are uniquely grammatical.

I suspect that in practice the most common reason why currently encountered databases are uniquely grammatical is that the very first theorem is often (the analogue of) idi, which has the form "|- ph => |- ph" where "ph" is a wff metavariable, so it essentially specifies the Syn(|-) = wff mapping right there. Some other good examples are hol.mm (which introduces the necessary axioms "for mmj2 compatibility"):

  $c wff $.  $( Not used; for mmj2 compatibility $)

  $( Internal axiom for mmj2 use. $)
  wffMMJ2 $a wff A |= B $.

  $( Internal axiom for mmj2 use. $)
  wffMMJ2t $a wff A : al $.

as well as dtt.mm (which uses multiple typecodes for precedence parsing). But I just can't shake the feeling that these are uniquely grammatical "by coincidence": clearly hol.mm has those axioms because mmj2 was acting up in some way without the ability to do grammar parsing, so that might be one explanation for why you find this property in the wild. Note that in hol.mm the idi theorem(s) are spelled

  ${ $e |- R |= A $. $p |- R |= A $. $}
  ${ $e |- A : al $.  $p |- A : al $. $}

which is to say, they don't ever use the wff axioms (there isn't even a variable of type wff, which means that the paper definition would not consider this a variable typecode and that would disqualify it as a grammatical database). Looking at this definition, there is really nothing at all that connects the wff typecode to anything that is going on in the rest of the file. You could introduce a wff2 typecode which is similarly unused and then it would obviously not be a uniquely grammatical database even ignoring the wff e. VT situation.

Or one could require that the types under coercions form a set of rooted trees, so that if there are no "partially duplicate" types (as in the example above with $a A * $. etc), then, even if the system is grammatical but not uniquely grammatical, there is a "most stringent" Syn function making it grammatical (in the example above with INHAB, this means taking Syn(INHAB)=setvar).

The drawback of choosing the "most stringent" Syn function is that it is not monotonic under adding axioms. With the "INHAB x" example, we might first decide Syn(INHAB)=setvar and then the second axiom is about "INHAB {x}" and now we think that Syn(INHAB)=class. This is not good if we are trying to build a model incrementally; we would instead want to say Syn(INHAB)>=setvar and then Syn(INHAB)>=class to reflect our increasingly precise information, but now this becomes difficult to model since we don't know what type Syn(INHAB) is but we still have to give it some semantics (which is consistent with any coercions that may be introduced in the future).

Mario
Reply all
Reply to author
Forward
0 new messages