New marker: "this database is unambiguous"

73 views
Skip to first unread message

Mario Carneiro

unread,
Mar 27, 2016, 6:08:06 PM3/27/16
to metamath
Hi all,

After the "Models for Metamath" paper (http://us2.metamath.org:88/ocat/model/model.pdf), it has become clear to me that the information provided by a standard Metamath file is insufficient to derive all the properties we want from grammatical parsing. To that end, I would like to introduce a new annotation, to be placed in set.mm and other unambiguous grammars:

$( $j
  syntax "|-" as "wff";
  unambiguous;
$)

This is an example of a $j comment, which has not previously been used but will follow roughly the same layout as the $t comment. (In the future these may be merged.) It is intended to denote "extra" commands, which are optionally understood by verifiers and can be safely ignored. In particular, unlike the $t comment, a $j statement headed by an unknown keyword should be ignored, and is not a parse error or warning.

There are two statements mentioned here, which I will explain via a third which is not as useful on its own.

* grammar;

This indicates that the containing .mm file describes a grammatical formal system (see paper), which enables stricter checks on the expressions found in the file. Every expression must either start with a variable typecode (that is, used in some $f statement), or a typecode explicitly declared by "syntax".

* syntax "|-" as "wff";

This is the analogue of Syn(|-) = wff in the paper. It indicates that |- is a valid typecode, and any expression in the file that starts with |- should be parsed as a wff (and should have a valid parse). This command automatically enables "grammar;" because it is not useful otherwise.

* unambiguous;

Knowing that a system is grammatical is not as useful as knowing that it is unambiguous, and many parsing methods make this assumption. This keyword also supercedes "grammar;" and indicates that the formal system is unambiguous (see paper). This is a bare assertion, which should probably be taken at face value by the verifier in order to enable any optimizations that come with this fact. Of course if the verifier can falsify this later it should give an error.

New errors:
* Some $e, $a, $p statement used a typecode that has not been declared by a $f statement or a "syntax" statement, but it was declared as "grammar;"
* Some $e, $a, $p statement could not be parsed according to its syntax typecode (in either "grammar;" or "unambiguous;")
* The database was marked as "unambiguous;" but some $e, $a, $p statement does not have a unique parse; or the grammar was analyzed and found to be ambiguous.

These $j errors are actual errors (at the same severity level as verification errors), but they are all optional to be detected by a verifier (by the very nature of appearing in a statement that can be optionally ignored).

I would like to add this particular comment to set.mm, just after the $c |- $. line (or maybe after the $f statements in that section). For now, no one will be able to read it, but I would like to bank on the existence of a statement like this in future verifiers. In particular, it needs to come right at the start, not in the giant $t  comment, because it will be used when parsing later $e $a $p statements.

Mario

Norman Megill

unread,
Mar 27, 2016, 9:56:26 PM3/27/16
to Metamath
I will go along with Mario's proposal, particularly since it has no impact on existing verifiers.

For the longer term, I have been wondering what might be the best future direction for the Metamath language in light of Mario's paper.  What if we require, as part of the spec, that systems it encodes have the property of being unambiguous (as defined in the paper)?  I don't really care that it be able to handle miu.mm or other bizarre systems that have no real-life applications and may be best handled by modeling within ZFC.

Is it possible to enhance or rewrite the spec in a simple way to accomplish this, using language similar to that of the existing spec that a novice programmer writing a Metamath verifier, or an intelligent layperson reading the spec, could understand?

If the only way to do this involves a significant spec complication, maybe it's best to leave the language proper alone, since its simplicity (reflected in the simplicity of the proof verifiers written so far) is an attractive feature for a lot of people.

But we can still plant a seed to start thinking about the long-term direction of the language, or a next generation replacement with something of similar simplicity, where the property of being grammatical/unambiguous is inherent (if we think that is important).

There is no hurry for this and it need not even happen on my watch. No matter what form it ultimately takes (and hopefully simplicity and transparency will continue to be emphasized), the language is merely a carrier for the proofs we create.  Those provide the real value of Metamath.  The nice thing about them is that they are eternal and will live on (in a possibly translated form) through any evolution of the language itself.

Norm

Mario Carneiro

unread,
Mar 28, 2016, 12:26:45 AM3/28/16
to metamath
On Sun, Mar 27, 2016 at 9:56 PM, Norman Megill <n...@alum.mit.edu> wrote:
I will go along with Mario's proposal, particularly since it has no impact on existing verifiers.

For the longer term, I have been wondering what might be the best future direction for the Metamath language in light of Mario's paper.  What if we require, as part of the spec, that systems it encodes have the property of being unambiguous (as defined in the paper)?  I don't really care that it be able to handle miu.mm or other bizarre systems that have no real-life applications and may be best handled by modeling within ZFC.

Is it possible to enhance or rewrite the spec in a simple way to accomplish this, using language similar to that of the existing spec that a novice programmer writing a Metamath verifier, or an intelligent layperson reading the spec, could understand?

The easy part would be adding a metamath keyword corresponding to "syntax...;" and to make "grammar;" required (and implicit in any database). However, "unambiguous;" is not really appropriate as a base keyword, because it can't be verified. (Testing unambiguity of a grammar is undecidable, so it would be impossible to verify the correctness of a database.) Here it is only being used as an indication of intent, to make it easier for verifiers to optimize.
 
If the only way to do this involves a significant spec complication, maybe it's best to leave the language proper alone, since its simplicity (reflected in the simplicity of the proof verifiers written so far) is an attractive feature for a lot of people.

The general idea of declaring typecodes is something that is simple enough to have made it into the original spec, and it certainly doesn't add much complexity. In this proposal, I'm letting variable typecodes be implicitly declared by their usage in $f statements, but you could also explicitly declare them if preferred. (In keeping with the restrictions on the Syn function in the paper, a "syntax;" command or its equivalent keyword should either not be allowed for variable typecodes, or should be required to map to itself on a variable typecode. It should also always map to a variable typecode, which may not be checkable at the statement itself if variable typecodes are not predeclared.)
 
But we can still plant a seed to start thinking about the long-term direction of the language, or a next generation replacement with something of similar simplicity, where the property of being grammatical/unambiguous is inherent (if we think that is important).

Another direction you could take if the language was redone is to require unambiguity by using trees in the basic representation; the whole trick of using unambiguous grammars is basically a hack to allow for using the metamath string rewriting system as if it were a tree substitution system. Most other formal systems (and programming languages) use syntax trees for internal representation, and make it decidable by building in a precedence grammar parser to the verifier.

Actually, there are several stronger variants of "unambiguous;" that could be explicitly checked, such as declaring the specific parser class - for example "unambiguous LR(0);" might be a hint to the verifier that an LR(0) parser should be sufficient to parse the grammar (which can easily be verified). But most verifiers would only support a few classes, whatever they happened to implement, so supporting them all would be a big commitment to ask of verifiers.

The big advantage of using string substitution is that it is really simple to implement, and I think we are agreed that this is something we want to keep. Many verifiers don't even bother with parsing, since it is only necessary for proof authoring. It is conceivable to use a more complicated language that uses more automatic trickery like having precedence rules, tokenizes words and non-words based on the notation rather than just whitespace, and left/right associativity, but this places a greater burden on verifiers and so is likely to encourage having only one "canonical" verifier. Other languages like this approach, but it weakens assurances that the kernel is correct, and I much prefer the decentralized approach with a Metamath verifier being at no greater complexity than an average undergrad comp sci project.
 
Mario

Mario Carneiro

unread,
Mar 28, 2016, 2:06:05 AM3/28/16
to metamath
Here is a specific suggestion for adding to the spec:

Add a typecode statement, $y. For set.mm, it would look like this:

$y set class wff $.
$y |- $= wff $.

There are two forms, $y c1 c2 ... cn $. and $y c1 $= c2 $. The first form $y c1 c2 ... cn $. is shorthand for $y c1 $. $y c2 $. ... $y cn $. (similar to $c statements), and $y c1 $. is shorthand for $y c1 $= c1 $.

The statement $y c1 $= c2 $. declares a typecode c1. If c1 = c2 then this is a variable typecode, otherwise it is a non-variable typecode (and c2 must be a previously declared variable typecode). Both c1 and c2 must be previously declared constants. [Another possibility is to have this supersede $c statements, that is expand "$y c1 $= c2 $." to "$c c1 $. $y c1 $= c2 $." so that you don't have to declare it twice.]

Any $e $a $p statement must start with a declared typecode, and any $f statement must start with a declared variable typecode. [Since I don't want to require verifiers to do parsing, the information in a $y statement can otherwise be ignored.]

Mario Carneiro

unread,
Mar 28, 2016, 2:20:40 AM3/28/16
to metamath
While we're on the subject, it might be a good idea to completely separate typecode constants from other constants. That is, a typecode constant is not allowed to appear in any position except the first in a statement. Defining "set" as a class, as in "class set"; "|- set = { x | x = x }" is something that is technically valid today, but probably should not be (and I know that mmj2 actually banks on this disjointness for one of its algorithms, which uses shorthand like "class class e. class" internally to represent "class A e. B" after erasing variable names; this would break if there were actually a class constant called "class"). This would encourage the idea of having $y c1 $= c2 $. statements accept an *undefined* symbol for c1 (not previously declared via $c or $v), so that people don't get the idea that a typecode constant can be used where regular constants normally appear.

Mario

fl

unread,
Mar 28, 2016, 8:15:05 AM3/28/16
to Metamath
 
After the "Models for Metamath" paper (http://us2.metamath.org:88/ocat/model/model.pdf), it has become clear to me that the information provided by a standard Metamath file is insufficient to derive all the properties we want from grammatical parsing. 


I'm not sur I'm understanding the whole problem but I have always been a bit annoyed with all what is related to parsing in set.mm

The first time the issue came to the table, it was because O'Cat wanted to parse set.mm for mmj2. 
He needed a start symbol and hd other concerns like that. set.mm was modified 
and I think it was not a good thing. Because it was clearer before. It added considerations that are without signficance
for logic. I think we must be as close as possible to the mathematical definition of a logic. This is the strenght of metamath
and all those considerations pollute the simplicity of the original design.

I also think that all those considerations can be solved outside set.mm either by designing mmj2 or other tools 
in a way where their peculiar needs are solved by themselves or unformally by checking the problems of ambiguity and other
problems at the human being level.

We must stick to the mathematical definition of a logic and add nothing else. The other proof checkers are boring because they
want to add a lot of additional features in a such a way nobody can understand them any longer.

-- 
FL 

Mario Carneiro

unread,
Mar 28, 2016, 8:53:54 AM3/28/16
to metamath
On Mon, Mar 28, 2016 at 8:15 AM, 'fl' via Metamath <meta...@googlegroups.com> wrote:
 
After the "Models for Metamath" paper (http://us2.metamath.org:88/ocat/model/model.pdf), it has become clear to me that the information provided by a standard Metamath file is insufficient to derive all the properties we want from grammatical parsing. 


I'm not sur I'm understanding the whole problem but I have always been a bit annoyed with all what is related to parsing in set.mm

In a nutshell, we don't define our typecodes. This leads to an issue in which theorems cannot be checked for well-formedness because we don't record the association between the theorem typecode "|-" and the corresponding syntactical typecode "wff". A statement like "|- _V" is not well-formed because the expression "_V" is not a wff (even if it is a class expression).

This is not a problem for "one-shot" verifiers, which only attempt to verify an already fully formed MM file and don't bother with proof generation, useful error handling or proof assistant tools. If you want to support any of these activities, parsing becomes important, and without recording the above association, you can't even begin. mmj2 actually has "|-" => "wff" hardcoded into the program, which causes problems on other databases where there is no "wff" typecode or it is called something else. This is really a property of the database, and forcing the tool to guess is not appropriate.
 
The first time the issue came to the table, it was because O'Cat wanted to parse set.mm for mmj2. 
He needed a start symbol and hd other concerns like that. set.mm was modified 
and I think it was not a good thing. Because it was clearer before. It added considerations that are without signficance
for logic. I think we must be as close as possible to the mathematical definition of a logic. This is the strenght of metamath
and all those considerations pollute the simplicity of the original design.

Norm, could you expound on what exactly O'Cat wanted and you added to the spec when he was working on this?
 
I also think that all those considerations can be solved outside set.mm either by designing mmj2 or other tools 
in a way where their peculiar needs are solved by themselves or unformally by checking the problems of ambiguity and other
problems at the human being level.

I agree with you that whenever possible the verifier should take responsibility for this as much as possible, rather than forcing the MM file to contain more data. But sometimes the data is just not there in the file, and cannot be inferred reliably. In these cases one has to take a good look at why this is the case, and figure out where, "morally", the data is supposed to reside. In this case, the identity of the typecodes is something that should be declared in the file, even though it can almost completely be inferred from usage. In this case the typecodes are inferrable but the Syn function is not. (Note that every MM file I have seen is a grammatical formal system, including miu.mm, so even the more ambitious / backwards-incompatible $y proposal is compatible with all public databases, and the Syn function can conceivably be viewed as an omission from the original design.)
 
We must stick to the mathematical definition of a logic and add nothing else. The other proof checkers are boring because they
want to add a lot of additional features in a such a way nobody can understand them any longer.

Please, define logic. The answer to this very question, a suitable definition of logic compatible with Metamath, is what drove me to write the paper in the first place.

Frankly, Metamath does not well reflect "traditional" logic in its design. Usually "logic" means "first order logic" or "higher order logic" or "lambda calculus" or some extension of these, and all of them take capture-avoiding substitution as a primitive concept. And so you can't blame "the other guys" for building their proof checkers to also have this built in, since a direct reading of any logic textbook would lead you to such a foundation. Instead, we used a particular model that uses string substitution instead, and we built our own logic on top.

We should strive to include the features we want from a logic, eliminating "pathological" cases as much as possible, provided that we do not make the verifier's job more complicated. In my view the $y statement does this; the checks it imposes are quite trivial to implement, and in particular, *do not require the verifier to parse anything* - this is an explicit design goal.

Mario

fl

unread,
Mar 28, 2016, 1:04:17 PM3/28/16
to Metamath

Please, define logic. 

A collection C of constants, a collection V of variables, a collection F of well formed formulas  inductively defined based on C and V, A collection A of axioms
subset of F, a collection of inference rules whose elements are also parts of F. Then a definition of a proof: a list of axioms of well formed formulas either
belonging to A or derived from former formulas using inference rules.

Distinct variables or free variables are useful devices but are not mandatory as it is explained somewhere by Norm.

At one moment we can introduce metavariables for added convenience.

-- 
FL

Norman Megill

unread,
Mar 28, 2016, 7:19:27 PM3/28/16
to meta...@googlegroups.com
On 3/28/16 8:53 AM, Mario Carneiro wrote:
>
>
> On Mon, Mar 28, 2016 at 8:15 AM, 'fl' via Metamath
> <meta...@googlegroups.com <mailto:meta...@googlegroups.com>> wrote:
> The first time the issue came to the table, it was because O'Cat
> wanted to parse set.mm <http://set.mm> for mmj2.
> He needed a start symbol and hd other concerns like that. set.mm
> <http://set.mm> was modified
> and I think it was not a good thing. Because it was clearer before.
> It added considerations that are without signficance
> for logic. I think we must be as close as possible to the
> mathematical definition of a logic. This is the strenght of metamath
> and all those considerations pollute the simplicity of the original
> design.
>
>
> Norm, could you expound on what exactly O'Cat wanted and you added to
> the spec when he was working on this?

My memory may be faulty, but I roughly recall two things.

First was the problem of overloaded syntax wel/wcel, weq/wceq, and
wsb/wsbc, which created a problem (ambiguous parsing) for mmj2. This
was solved by making wel, weq, and wsb syntax "theorems" derived from
their counterparts.

Second (and this may not have been for mmj2, not sure) was to require
that labels and math tokens not share the name space: if we have a math
token named "exp" we can not have a theorem named "exp". There were 2
or 3 requests for this, for the Ghilbert translator and for at least one
independent verifier.

One thing O'Cat had trouble with was the empty substitution required in
miu.mm, but I think he solved it somehow with no spec changes.

The change referenced in footnote 5 p. 94 of the Metamath book, "Each
variable in a $e, $a, or $p statement must exist in an active $f
statement," was requested by a verifier writer. We've always done that
in set.mm and other .mm's, but it wasn't guaranteed by the spec.

Norm

Norman Megill

unread,
Mar 28, 2016, 7:34:17 PM3/28/16
to meta...@googlegroups.com
On 3/28/16 2:06 AM, Mario Carneiro wrote:
> Here is a specific suggestion for adding to the spec:
>
> Add a typecode statement, $y. For set.mm <http://set.mm>, it would look
> like this:
>
> $y set class wff $.
> $y |- $= wff $.
>
> There are two forms, $y c1 c2 ... cn $. and $y c1 $= c2 $. The first
> form $y c1 c2 ... cn $. is shorthand for $y c1 $. $y c2 $. ... $y cn $.
> (similar to $c statements), and $y c1 $. is shorthand for $y c1 $= c1 $.
>
> The statement $y c1 $= c2 $. declares a typecode c1. If c1 = c2 then
> this is a variable typecode, otherwise it is a non-variable typecode
> (and c2 must be a previously declared variable typecode). Both c1 and c2
> must be previously declared constants. [Another possibility is to have
> this supersede $c statements, that is expand "$y c1 $= c2 $." to "$c c1
> $. $y c1 $= c2 $." so that you don't have to declare it twice.]
>
> Any $e $a $p statement must start with a declared typecode, and any $f
> statement must start with a declared variable typecode. [Since I don't
> want to require verifiers to do parsing, the information in a $y
> statement can otherwise be ignored.]

We can consider this for a future Metamath 2.0 language (that may or may
not be different in other respects too). It probably wouldn't be a good
idea just to update the existing spec with this requirement since a new
keyword will cause all existing verifiers to break. Your earlier
proposal of a comment tagged with $j would probably be a better solution
for the near term.

Ideally what I had in mind was was making non-grammatical .mm files
illegal and adding additional requirements to existing, unmodified .mm
files that ensure they are grammatical. (Would there ever be any use
for a non-grammatical .mm?) This would allow any existing verifier to
continue to work, with the understanding that they would do an
incomplete verification w.r.t. the updated spec. We could have a test
.mm file that would pass old verifiers but produce an error on new ones,
to serve as a way to determine that an old verifier is incomplete.
(metamath.exe might remain in the incomplete category for some time,
depending on the difficulty of updating it.)

The set of variable typecodes can be collected from $f statements. The
type for the Syn function would be identified by the unique typecode
that doesn't appear in a $f statement. What it maps to is still a
problem, so it appears we would have to add something to existing .mm
files. However, we can add the rather weird statement

syn $a |- wff $.

to set.mm without violating the existing spec, and existing verifiers
should all accept this without error. "syn" is an arbitrary (not
hard-coded) label. In updated spec, the syn statement would be
identified by the unique $a statement having a typecode as the second
token, and otherwise typecodes would be allowed only as the first token
in any statement. I'm not sure what problems could be caused by someone
referencing this $a in a proof, but if it is a problem the new spec
could prohibit it.

This is somewhat of a hack, but it would meet the new requirement and
allow 100% backwards compatibility, unless I'm overlooking something.

Another way to achieve this without the syn statement is to require that
the typecode for the first $f be the Syn value. Then we probably
wouldn't have to touch any .mm files. Would this be a better solution?

-----

For a Metamath 2.0 that starts from scratch, one idea I've considered of
and on in the past is to turn the internal "language" of the Metamath
Solitaire applet into an actual language. It might have even been the
Metamath language if I had written the applet first.

All expressions would be Polish notation in the language itself and
mapped to human-friendly form for display. Polish notation is the big
obstacle I see for editing the source file directly and probably the
main reason I haven't pursued this approach. People have mentioned that
the S-expressions in Ghilbert make it hard to read, and Polish is worse.
(Actually for efficiency, reverse Polish - RPN - might be
computationally more efficient since it requires only a forward scan,
but that's even harder to read.)

A way to address Polish readability and ease of editing is to do
everything via a special editor, in the same way that we now work with
proofs indirectly and rarely edit them by hand.

The internal language in mm.java is roughly as follows.

The syntax axioms list the name, return type, number of arguments, and
human display format.

// Implication
tmpConnective = new Connective("wi", "wff", 2, "( $1 -> $2 )");
tmpConnective.setArgtype(0, "wff");
tmpConnective.setArgtype(1, "wff");

// Universal quantifier
tmpConnective = new Connective("wal", "wff", 2, "A. $1 $2");
tmpConnective.setArgtype(0, "var");
tmpConnective.setArgtype(1, "wff");

// Convert variable to class (invisible notation)
tmpConnective = new Connective("cv", "class", 1, "$1");
tmpConnective.setArgtype(0, "var");
connectiveVec.add(tmpConnective);


The axioms are specified in Polish notation. For ax-ext, you can see
that wel and weq are really set.mm's wceq and wcel, with cv used to
convert set variables to classes. So there is no syntax overloading.

// ax-3 $a |- ( ( -. P -> -. Q ) -> ( Q -> P ) ) $.
tmpAxiom = new Axiom("ax-3", "wi wi wn $1 wn $2 wi $2 $1",
"Axiom of contraposition (propositional calculus)");

// maj $e |- ( P -> Q ) $.
// min $e |- P $.
// ax-mp $a |- Q $.
Axiom rule_mpAxiom = new Axiom("ax-mp", "$2",
"Inference rule of modus ponens (propositional calculus)");
rule_mpAxiom.addHyp("$1");
rule_mpAxiom.addHyp("wi $1 $2");

// ax-ext $a |- ( A. x ( x e. y <-> x e. z ) -> y = z ) $.
tmpAxiom = new Axiom("ax-ext",
"wi wal $1 wb wel cv $1 cv $2 wel cv $1 cv $3 weq cv $2 cv $3",
"Axiom of extensionality (set theory)");
tmpAxiom.addDistinct("$1 $2");
tmpAxiom.addDistinct("$1 $3");

Proofs are just an RPN list of axioms, like Metamath's "normal" proofs
but without syntax subproofs.. Instead, the unification algorithm
figures out the (unique) most general substitutions. The propagation of
distinct variable requirements is computed as part of the unification
algorithm.

Variable names for dummy variables are drawn from a pool of unused
variables. The final theorem is always shown in most general form by
the applet; in a proof language, the program would make sure that the
desired theorem is a substitution instance of this most general form,
and perhaps back substitute into proof steps for display purposes.

The second main reason I haven't pursued this approach is I fear that
proof verification with the unification algorithm may be much slower
than with the Metamath language. I don't really know, though. There
has also been work on efficient unification algorithms that might be
exploitable.

Norm

Marnix Klooster

unread,
Mar 29, 2016, 1:11:09 AM3/29/16
to Metamath
Hi Norm, others,

I might be completely off base here, and I don't have the time to really study the proposals in this thread, but the goal here sounds the same as I had in the thread 'Automated Metamath ambiguity checking' (https://groups.google.com/d/msg/metamath/DrXCGlyj6_w/pHGsht_VBwAJ) which I started, but also did not have the time to follow up on (sorry about that).

Basically, my proposal there was that a statement a b c ... is unambiguous if there exists a unique Metamath proof of IS_TOPLEVEL a b c ... .  And that proof _is_ then the parse tree of that statement.  (And a database is then unambiguous if all its statements are.)

That would require a couple of additional statements ( $c IS_TOPLEVEL $. , $a IS_TOPLEVEL wff ph $. , $a IS_TOPLEVEL class A $. , etc.), be completely compatible, and require a single identifier to be known to any _parser_ (like mmj2), but no verifier needs to know about it.

And if the parser is told about (or detects) IS_TOPLEVEL, then it can assume the database to be unambiguous, and complain if it encounters a statement for which there are multiple proofs for IS_TOPLEVEL a b c ... , or none.

I much rather see that than the indeed weird $a |- wff $. , which looks like a contracted form of my ${ $f wff ph $. $a IS_TOPLEVEL |- ph $. $} .  Better to be explicit, don't you agree?

Finally, I think that parsing and unambiguity defined in Metamath terms ('there exists a unique Metamath proof of...') is better than one based on external concepts like context-free parsers (as is done in the 'Models for Metamath' paper, if I see correctly).

Just my 2 eurocents...

Groetjes,
 <><
Marnix




--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To post to this group, send email to meta...@googlegroups.com.
Visit this group at https://groups.google.com/group/metamath.
For more options, visit https://groups.google.com/d/optout.



--
Marnix Klooster
marnix....@gmail.com

Mario Carneiro

unread,
Mar 29, 2016, 4:44:52 PM3/29/16
to metamath
On Mon, Mar 28, 2016 at 7:34 PM, Norman Megill <n...@alum.mit.edu> wrote:
On 3/28/16 2:06 AM, Mario Carneiro wrote:
Here is a specific suggestion for adding to the spec:

Add a typecode statement, $y. For set.mm <http://set.mm>, it would look
like this:

$y set class wff $.
$y |- $= wff $.

There are two forms, $y c1 c2 ... cn $. and $y c1 $= c2 $. The first
form $y c1 c2 ... cn $. is shorthand for $y c1 $. $y c2 $. ... $y cn $.
(similar to $c statements), and $y c1 $. is shorthand for $y c1 $= c1 $.

The statement $y c1 $= c2 $. declares a typecode c1. If c1 = c2 then
this is a variable typecode, otherwise it is a non-variable typecode
(and c2 must be a previously declared variable typecode). Both c1 and c2
must be previously declared constants. [Another possibility is to have
this supersede $c statements, that is expand "$y c1 $= c2 $." to "$c c1
$. $y c1 $= c2 $." so that you don't have to declare it twice.]

Any $e $a $p statement must start with a declared typecode, and any $f
statement must start with a declared variable typecode. [Since I don't
want to require verifiers to do parsing, the information in a $y
statement can otherwise be ignored.]

We can consider this for a future Metamath 2.0 language (that may or may not be different in other respects too).  It probably wouldn't be a good idea just to update the existing spec with this requirement since a new keyword will cause all existing verifiers to break.  Your earlier proposal of a comment tagged with $j would probably be a better solution for the near term.

I agree. This part was just a response to "how would this proposal look like as a base metamath statement" and is obviously not backwards-compatible like the original suggestion.

Ideally what I had in mind was was making non-grammatical .mm files illegal and adding additional requirements to existing, unmodified .mm files that ensure they are grammatical.  (Would there ever be any use for a non-grammatical .mm?)  This would allow any existing verifier to continue to work, with the understanding that they would do an incomplete verification w.r.t. the updated spec.  We could have a test .mm file that would pass old verifiers but produce an error on new ones, to serve as a way to determine that an old verifier is incomplete. (metamath.exe might remain in the incomplete category for some time, depending on the difficulty of updating it.)

The set of variable typecodes can be collected from $f statements.  The type for the Syn function would be identified by the unique typecode that doesn't appear in a $f statement.  What it maps to is still a problem, so it appears we would have to add something to existing .mm files.  However, we can add the rather weird statement

syn $a |- wff $.

to set.mm without violating the existing spec, and existing verifiers should all accept this without error.  "syn" is an arbitrary (not hard-coded) label.  In updated spec, the syn statement would be identified by the unique $a statement having a typecode as the second token, and otherwise typecodes would be allowed only as the first token in any statement.  I'm not sure what problems could be caused by someone referencing this $a in a proof, but if it is a problem the new spec could prohibit it.

This is somewhat of a hack, but it would meet the new requirement and allow 100% backwards compatibility, unless I'm overlooking something.

Well, that's something. My first reaction is to reject it as breaking peoples' understandings of what an $a statement means, but you won't be able to use it even if you wanted to. For example, even idi applied to syn wouldn't work, because that would require proving "wff wff" which isn't true.

This statement will cause problems if "|-" here were a variable typecode. For example if we had $a wff set $. we could prove |- ( set <-> set ) and all sorts of other things. However, by assumption this trick should only be used for non-variable typecodes, so it should work, odd as it seems.

However, if your goal is to backwards-compatibly update the checks in the verifier, I think the $j statement has better semantics for this, and it scales to other future improvements as well unlike this idea. (That is, the Metamath language is not extensible, but $j "language" is, assuming the extensions add more keywords rather than changing existing ones. For the latter we may need to introduce versioning, but - you guessed it - we could always add a keyword like "version 2;" later if we need it.)

By the way, you may have noticed that I started with letting variable typecodes be implicit but switched to explicit for the $y proposal. The reason is because it makes it simpler to express the constraints on the mapping: $y c1 $= c2 $. where c1=/=c2 *requires* c1 to not be a variable typecode and *requires* c2 to be a variable typecode. If the variables for a given typecode haven't been declared yet it is harder to determine this, since we don't know whether something that looks like a constant or non-variable typecode now could later be a variable typecode. It also allows eliminating typecodes from the $c statement - currently they have to be there because an $f won't let you skip the $c definition for the typecode.

This proposal raises typecodes and variable typecodes to be first-class citizens, and the syntax should reflect that. I would amend my original proposal with "syntax "set";" lines to explicitly indicate variable typecodes for this reason.
 
Another way to achieve this without the syn statement is to require that the typecode for the first $f be the Syn value.  Then we probably wouldn't have to touch any .mm files.  Would this be a better solution?

Note that in general, the Syn function can have many values, one for each non-variable typecode, and these can point to different variable typecodes. hol.mm actually has two non-variable typecodes in principle, but due to issues with mmj2's assumption that there is only one, I simulated it inside the language.
 
For a Metamath 2.0 that starts from scratch, one idea I've considered of and on in the past is to turn the internal "language" of the Metamath Solitaire applet into an actual language.  It might have even been the Metamath language if I had written the applet first.

All expressions would be Polish notation in the language itself and mapped to human-friendly form for display.  Polish notation is the big obstacle I see for editing the source file directly and probably the main reason I haven't pursued this approach.  People have mentioned that the S-expressions in Ghilbert make it hard to read, and Polish is worse.  (Actually for efficiency, reverse Polish - RPN - might be computationally more efficient since it requires only a forward scan, but that's even harder to read.)

A way to address Polish readability and ease of editing is to do everything via a special editor, in the same way that we now work with proofs indirectly and rarely edit them by hand.

Yep, this is what the rest of the world does, although other people call these "abstract syntax trees" (ASTs) and don't put them in the file (the translation is between the input file and the internal representation, not the internal representation to the display). But this means that verifiers have to do parsing.

An MM file is in this weird middle ground between human-readable and computer-processable data. Our approach to fast and simple verification requires that an MM file be as simple as possible to process, so if we used RPN for the math representation, that should logically be in the file as well. But then we remember that humans also read and write the MM file, and so we are stuck.
 
A common approach used elsewhere is to separate these two. Keep a human-readable format, let's call that .mmi, and a computer-readable form, .mmo . Verifiers will be able to easily read and verify .mmo files, and in there you will find RPN math and any other computer optimizations (hell, it could even just be a binary file, and the compression format could be tweaked to take advantage of this). "Advanced" verifiers will translate between the two formats. The relationship here is similar to that between a "source" and "object" file in a compiler, except that there is still some more verification to be done on the object file (with no third output file format).

Source control software usually only track the source file, and let users generate their own object file. I'm torn on whether to keep both in the source control, since we want users to skip the mmi format if they want, or write their own for a third party verifier. But anyway, this is a completely different conversation...

On Tue, Mar 29, 2016 at 1:10 AM, Marnix Klooster <marnix....@gmail.com> wrote:
Finally, I think that parsing and unambiguity defined in Metamath terms ('there exists a unique Metamath proof of...') is better than one based on external concepts like context-free parsers (as is done in the 'Models for Metamath' paper, if I see correctly).

Sorry about that. This is an artifact of my intended readership, which are regular mathematicians - so I attempt to deconstruct everything Metamath and put it in informal math terms.

For a Metamath-ematician, of course this is less than ideal. Nevertheless, there are many other ways to express the constraint, although none that use exclusively Metamath-provable notions, because it is a meta-theoretic property. (I will remind you that "there exists a unique proof of x" is not expressible in Metamath directly.)

To start with, the easy definition: A statement x e. EX_C is said to be uniquely provable, if there is a unique axiom <D,H,A> that applies to yield x, with a unique substitution sigma to the variables in H u. {A}, and the substituted hypotheses sigma(H) are all uniquely provable (this is a recursive definition).

This definition leaves much to be desired, if we want to capture the notion of "unambiguous formal system". First, the presence of theorems will usually break "unique provability", which is why the above notion only works with direct-from-axioms proofs (you could extrapolate a direct-from-axioms proof given a standard proof). Second, theorems almost always have non-unique proofs: for example idi can be proven by applying ax-mp to id, which allows you to extend any proof of "|- ph" for some ph by six axioms.

This is why "unambiguous formal system" is defined in terms of "grammatical formal system", because we need to have a notion of parsing a theorem, not by its proof, but by the syntax proof of its expression. However, you don't need the Syn function for the definition: A grammatical formal system is unambiguous iff every provable x e. EX_C with Type(x) e. VT (an expression starting with a variable typecode) is uniquely provable.

In other words, syntax proofs are unique, so for any theorem T, Syn(T) will be "wff ..." which is a syntax expression, hence uniquely provable. (The definition of grammatical formal system, on the other hand, ensures that this expression is provable in the first place.)

Basically, my proposal there was that a statement a b c ... is unambiguous if there exists a unique Metamath proof of IS_TOPLEVEL a b c ... .  And that proof _is_ then the parse tree of that statement.  (And a database is then unambiguous if all its statements are.)

With respect to the current proposal, I wouldn't like this approach because it requires you to reinterpret a typecode constant (first position) as a constant (second position), which I am trying to avoid. You could get around this by making "copies" of all the typecodes for usage in IS_TOPLEVEL: that is, map "a b c..." to "IS_TOPLEVEL <a> b c ..." and have "<wff>" be a new constant instead of reusing "wff".

You would also want to remove any syntax theorems like weq, because they would break the "unique proof of" property, as I mentioned above, unless you are careful about direct-from-axioms proofs.

But other than that, yes I think this is isomorphic to the approach with Syn, with one exception. Suppose that for each typecode c, there is a unique axiom $a IS_TOPLEVEL <c> v $. where v is a variable with type "t" (that is, defined with $f t v $.). Then we can define Syn(c) = t, and since syntax proofs act like CFGs, the two definitions will behave the same in this case.

The problem is what happens when the restrictions in the paper are not satisfied. You are guaranteed Syn : TC -> VT in any case with IS_TOPLEVEL (since the input is a variable, whose type must be a variable typecode, and the output is a <bracketed> typecode), but what if Syn is not a retraction (that is, Syn(c) = c where c e. VT)? Then the parser will try to parse sets like classes, wffs like sets,.... Who knows what could happen, but it won't be checking what you want, that's for sure. So I think we are safe to assume this.

The other possibility is that there are two statements $a IS_TOPLEVEL <c> v1 $. $a IS_TOPLEVEL <c> v2 $. where v1 and v2 have types t1, t2 respectively. If t1 = t2, then we immediately have a breakdown of unambiguity since the axioms are identical. If not, then it gets more interesting. Syn cannot be a function in this case, since we have Syn(c) = t1 and Syn(c) = t2 at the same time, but it might nevertheless be the case that there are unique proofs of "IS_TOPLEVEL<c> a b..." because for whatever reason, the expressions that can be validly parsed as t1 and as t2 are disjoint.

For example, in hol.mm there are two types of provable assertion, a typing assertion "|- A : al" and a deduction "|- R |= S". In fact, the type "wff" in hol.mm is basically the same as IS_TOPLEVEL, since it is only used to form parse trees for these two at the top level (neither ":" nor "|=" can appear in a term, not counting the ":" inside kl). There are no wff variables, so "wff" is actually a non-variable typecode.

With the $y proposal, we would have wff as a variable typecode, because the important distinction is not the presence of variables but its context-free behavior. It would be declared as $y var type term wff $. $y |- $= wff $. .

It's actually striking how similar smm's statements (https://github.com/sorear/smm-webui/blob/master/public/assets/set-config.mm) are to the current proposal:

$( $t
   -smm-syntactic-categories 'set' 'class' 'wff';
   -smm-logical-categories '|-' 'wff';
$)

vs.

$( $j
   syntax 'set' 'class' 'wff';
   syntax '|-' as 'wff';
$)


Mario

Jens-Wolfhard Schicke-Uffmann

unread,
Mar 29, 2016, 8:26:29 PM3/29/16
to meta...@googlegroups.com
On Mon, Mar 28, 2016 at 08:53:53AM -0400, Mario Carneiro wrote:
> This is not a problem for "one-shot" verifiers, which only attempt to verify an
> already fully formed MM file and don't bother with proof generation, useful
> error handling or proof assistant tools. If you want to support any of these
> activities, parsing becomes important, and without recording the above
> association, you can't even begin. mmj2 actually has "|-" => "wff" hardcoded
> into the program, which causes problems on other databases where there is no
> "wff" typecode or it is called something else. This is really a property of the
> database, and forcing the tool to guess is not appropriate.
I value the property of Metamath that the proof kernel does not
understand the target logic language at all (and is much simpler
for it) highly. When I looked around for theorem checkers for
my current hobby project, I knew my search was over, when I read
$c ( $.

Hence I prefer the $( $j ... $) solution very much, because it makes
it crystal clear that this is like a tangential remark, maybe
interesting for some tools (e.g. mmj2 and igor), but obviously
irrelevant for a formal understanding of the mathematical proofs
contained in set.mm.

For igor, I actually don't care much about whether $j exists at all,
it is (becoming) a set.mm assistant, not a general Metamath assistant,
so it is full of semi-hardcoded theorem names and such stuff, anyway.

> I agree with you that whenever possible the verifier should take responsibility
> for this as much as possible, rather than forcing the MM file to contain more
> data. But sometimes the data is just not there in the file, and cannot be
> inferred reliably. In these cases one has to take a good look at why this is
> the case, and figure out where, "morally", the data is supposed to reside. In
> this case, the identity of the typecodes is something that should be declared
> in the file, even though it can almost completely be inferred from usage.
Why do you want all typecodes declared? Most can (and should) be inferred from
the existence of $f declarations. Why isn't it sufficient to declare the
relationship between |- and wff, and maybe put some hints to what parser
should be used.

On the latter topic, I propose to have "unambiguous" and "parse-lalr" both
put in set.mm (inside some $j), the former making a promise about the
database, the latter giving a "proof" (checkable by a lalr-parser
generator of your choice), or ignored by a tool which does not know
about a "lalr"-parsing algorithm. If all keywords of the second kind are
prefixed with "parse-", we can be rather liberal with naming our newest
algorithms (like your lalr++ grammar class, you discussed lately).

> We should strive to include the features we want from a logic, eliminating
> "pathological" cases as much as possible, provided that we do not make the
> verifier's job more complicated. In my view the $y statement does this; the
> checks it imposes are quite trivial to implement, and in particular, *do not
> require the verifier to parse anything* - this is an explicit design goal.
I think if something does not concern a verifier, it belongs between $( and $).
A Metamath theorem can be 100% verified, without $y. Hence it should not
be a full statement (which _at least_ increases parser complexity for
the Metamath language).


Regards,
Jens
signature.asc

Mario Carneiro

unread,
Mar 29, 2016, 9:47:00 PM3/29/16
to metamath
On Tue, Mar 29, 2016 at 8:26 PM, Jens-Wolfhard Schicke-Uffmann <drah...@gmx.de> wrote:
Why do you want all typecodes declared? Most can (and should) be inferred from
the existence of $f declarations. Why isn't it sufficient to declare the
relationship between |- and wff, and maybe put some hints to what parser
should be used.

Because the checking of the $y statement (or "syntax" statement) *itself* requires information about which typecodes are variable typecodes. Furthermore, explicit declaration of variable typecodes allows for variable typecodes with no variables, which is a bit peculiar given the name; but really there is no contradiction, because no property of variable typecodes requires that there be a variable of that type. Instead, we need that all variables have a type which is a variable typecode, and we have a property (unique parses) that must be true of all variable typecodes. I mentioned that hol.mm actually separates these two: the types "var type term" have variables, the types "var type term wff" have unique parses, and the whole list of typecodes is "var type term wff |-".

In hindsight, I should have called them "syntax typecodes".
 
On the latter topic, I propose to have "unambiguous" and "parse-lalr" both
put in set.mm (inside some $j), the former making a promise about the
database, the latter giving a "proof" (checkable by a lalr-parser
generator of your choice), or ignored by a tool which does not know
about a "lalr"-parsing algorithm. If all keywords of the second kind are
prefixed with "parse-", we can be rather liberal with naming our newest
algorithms (like your lalr++ grammar class, you discussed lately).

Actually, I would use "unambiguous 'lalr n';" for that. The possible grammars and the relations between them would need to be built in to any grammars that care to prove it, but this notation would indicate "this grammar is unambiguous, because it can be parsed by an LALR(n) parser". For those that don't know what an LALR(n) parser is, it would just be treated as an "unambiguous;" notation, while those that do would construct one and use the fact that the construction succeeded as a proof of unambiguity. Systems like mmj2, with several built in parsers, would select the appropriate parser (or none at all) depending on this field. The relations would need to be known by parsers so that a parser selector would know that a LR(1) parser is valid for "unambiguous 'lalr 1';".

FYI, I have taken to calling my grammar class "KLR" for "LR(0) with Komposite rules" (CLR sounds like canonical LR). (It is not clear to me that KLR(n) exists.) We would want "unambiguous 'klr';" in set.mm, though, since you showed that set.mm is not LALR(n). (I don't know any more general parse class that is tight enough to prove unambiguity.)

I think if something does not concern a verifier, it belongs between $( and $).
A Metamath theorem can be 100% verified, without $y. Hence it should not
be a full statement (which _at least_ increases parser complexity for
the Metamath language).

The problem is what does it mean for a metamath database to be verified? The $y statement introduces more ways for a database to be ill-formed, so a database that 100% verifies without $y might not with it. For example, the following currently valid database is invalid no matter what $y statements are used:

$c x $.
ax $a x x $.

(This violates the separation of typecodes and constants.) Or this one:

$c x y $.
ax $a x y $.

(There is no valid parse for "T y" where T is some syntax typecode.)

The latter problem can be detected even by a no-parsing verifier, because there are no syntax typecodes. But what do we call the errors in a Metamath file that are detected by a parser but not by a verifier? It is still an indication of ill-formedness of the database, it just takes more work to find.

Perhaps a more appropriate Metamathy solution is to require parses for all $e and $a statements ($p statements can derive parses from their proofs). That way even a no-parsing verifier can prove that all expressions have parses. My prejudice against this solution is probably just because I am used to the current setup.

On the other side of this is a .mm file intended for parsing verifiers, where syntax proofs are omitted entirely. (Actually, this is not so much a parser but the still-more-powerful unifying verifier.)

Mario

fl

unread,
Mar 30, 2016, 9:45:26 AM3/30/16
to Metamath
 
(Would there ever be any use for a non-grammatical .mm?)


The formalization of the famous Hofstadter's example?

--
FL

Mario Carneiro

unread,
Mar 30, 2016, 9:54:29 AM3/30/16
to metamath
The MIU system is a grammatical formal system, in the sense of the paper. It is not, however, unambiguous. This is achieved because of the addition of the wxy axiom:

 $( The empty sequence is a wff. $)
 we   $a wff $.
 $( "M" after any wff is a wff. $)
 wM   $a wff x M $.
 $( "I" after any wff is a wff. $)
 wI   $a wff x I $.
 $( "U" after any wff is a wff. $)
 wU   $a wff x U $.
 $( If "x" and "y" are wffs, so is "x y".  This allows the conclusion of
    ` IV ` to be provable as a wff before substitutions into it, for those
    parsers requiring it.  (Added per suggestion of Mel O'Cat 4-Dec-04.) $)
 wxy  $a wff x y $.

(In fact, in the presence of wxy, the others can be simplified to just "wff M", "wff I", "wff U".) Of course in this system there are multiple parses for the same string, for example "wxy we we" vs "we" are two derivations of the empty string. Removing wxy yields unambiguous parses for all wffs, but then it is not a grammatical formal system when we include the axioms:


 ${
   IIa  $e |- M x $.
   $( Given any theorem starting with "M", it remains a theorem
      if the part after the "M" is added again after it. $)
   II   $a |- M x x $.
 $}

This one is not parseable given the rules for wffs without wxy, because the only possible derivations are of the form "C C C ..." or "x C C C ..." where C is one of the constants M,I,U and x is a wff variable.

Mario

--

Mario Carneiro

unread,
Mar 30, 2016, 10:24:46 AM3/30/16
to metamath
Note that this generalizes: Given a formal system such that all the variable typecode axioms have no hypotheses or DV conditions (i.e. weakly grammatical), the database can be augmented with a new variable typecode and some silly string-forming rules for it, and the new database will be grammatical. Thus, in a precise sense we can "upgrade" any weakly grammatical formal system to a grammatical one.

More concretely, add to the database:

$c string $.
sx $f string x $.
sy $f string y $.

s0 $a string $.
sxy $a string x y $.

sc1 $a string c1 $.
sc2 $a string c2 $.
...

The list at the end runs over all constant symbols in the database (including typecodes and "string"), and at least one of each type of variable (except string).

Set Syn(T) = string for every non-variable typecode T. Then since every expression "string a b c ..." is provable (for any string of variables and constants), we have Syn(T a b c...) = "string a b c ..." is provable for every axiom and hypothesis expression "T a b c...".

---

I made two assumptions above. One is that $e statements are always non-variable typecode expressions, or at least are provable syntax expressions (although a provable syntax expression is always unnecessary in an axiom or theorem). Perhaps this should have been added to the definition of "weakly grammatical".

The second is the explicit one, that the formal system is weakly grammatical. What if it's not? For that matter, why do all variables need to range over context-free syntactical categories? What would break if there was a variable $f |- T $. ?

The point is that a proof of, for example, hbth, in this notation "|- ( T -> A. x T )" with no hypotheses, would require the _parser_ to invent a proof of T. We could easily put a difficult or impossible theorem there and the parser would need to prove the theorem to verify, and disprove the theorem to give an error. This proof would be recorded in the "syntax proof" section of the result.

Mario

Norman Megill

unread,
Mar 30, 2016, 2:26:59 PM3/30/16
to meta...@googlegroups.com
On 3/29/16 8:26 PM, Jens-Wolfhard Schicke-Uffmann wrote:
> On Mon, Mar 28, 2016 at 08:53:53AM -0400, Mario Carneiro wrote:
>> This is not a problem for "one-shot" verifiers, which only attempt to verify an
>> already fully formed MM file and don't bother with proof generation, useful
>> error handling or proof assistant tools. If you want to support any of these
>> activities, parsing becomes important, and without recording the above
>> association, you can't even begin. mmj2 actually has "|-" => "wff" hardcoded
>> into the program, which causes problems on other databases where there is no
>> "wff" typecode or it is called something else. This is really a property of the
>> database, and forcing the tool to guess is not appropriate.
> I value the property of Metamath that the proof kernel does not
> understand the target logic language at all (and is much simpler
> for it) highly. When I looked around for theorem checkers for
> my current hobby project, I knew my search was over, when I read
> $c ( $.
>
> Hence I prefer the $( $j ... $) solution very much, because it makes
> it crystal clear that this is like a tangential remark, maybe
> interesting for some tools (e.g. mmj2 and igor), but obviously
> irrelevant for a formal understanding of the mathematical proofs
> contained in set.mm.

I agree and don't think changing the present language is necessary or
desirable at this point. It would break a lot of verifiers and other
programs. My various suggestions for changing the language were
intended mainly to help us think about what might be useful in a future
generation of the language. Except for a probable change prohibiting
local $v's that we've discussed, I don't anticipate any further spec
changes for this generation of the language proper. (Enriching the
markup is always fine in principle and serves a secondary purpose in
guiding us when determining what is important to put in the language
itself for the next generation.)

Norm

Norman Megill

unread,
Mar 30, 2016, 2:35:49 PM3/30/16
to meta...@googlegroups.com
On 3/30/16 2:26 PM, Norman Megill wrote:
> Except for a probable change prohibiting
> local $v's that we've discussed, I don't anticipate any further spec
> changes for this generation of the language proper.

Actually I meant prohibiting a change in variable type by local (scoped)
$v's. I think they can still be useful for rare cases where say a lemma
requires many more dummy variables than are available, but adding them
globally would place an unnecessary burden for subsequent statements to
keep track of.

Not sure how this would impact Mario's mmamm.m; he mentioned it doesn't
handle local $v's.

Norm

Mario Carneiro

unread,
Mar 30, 2016, 11:29:16 PM3/30/16
to metamath
Let me say a bit about that. I don't really understand why local $v's are a thing, because it doesn't fit the usual semantics. The way I see it, everything which is scoped by ${ $} is like that because they are used for building the extended frame, which is pared down to the frame for each statement. That means $d, $f and $e statements. On the other hand, $v's are never used in a frame. Rather, they are used for defining the *language* (the set of legal tokens in math expressions). The only use of scoping a $v is if you wanted to make it a $c later. But then the situation is asymmetric, because you can't scope a $c. mmamm treats $v statements as if they always have global scope, which causes duplicate $v errors if you try to scope it and reintroduce, as is done in set.mm.

Mario



Norm

Mario Carneiro

unread,
Mar 31, 2016, 3:02:19 AM3/31/16
to metamath
FYI, the $j comments have been added to set.mm, see https://github.com/metamath/set.mm/commit/570be52 . It may be a while before anyone reads it, though.

Mario

fl

unread,
Mar 31, 2016, 2:55:47 PM3/31/16
to Metamath


> This $j comment need notbe read by verifiers, but is useful for parsers like mmj2.

I can't see how it can be used by parsers. Because parsers have this pecularity: if it can be parsed, they parse, if it can't be parsed they don't parse.

-- 
FL

Marnix Klooster

unread,
Mar 31, 2016, 3:18:45 PM3/31/16
to Metamath
Hi Mario,

Here is a response to just a few parts of one of your mails in this thread.  The parts I snipped out are those I either agree with or am not really interested in / have no opinion about, or deserve a separate thread.

On Tue, Mar 29, 2016 at 10:44 PM, Mario Carneiro <di....@gmail.com> wrote:

[snip]
 
On Tue, Mar 29, 2016 at 1:10 AM, Marnix Klooster <marnix....@gmail.com> wrote:
Finally, I think that parsing and unambiguity defined in Metamath terms ('there exists a unique Metamath proof of...') is better than one based on external concepts like context-free parsers (as is done in the 'Models for Metamath' paper, if I see correctly).

Sorry about that. This is an artifact of my intended readership, which are regular mathematicians - so I attempt to deconstruct everything Metamath and put it in informal math terms.

For a Metamath-ematician, of course this is less than ideal. Nevertheless, there are many other ways to express the constraint, although none that use exclusively Metamath-provable notions, because it is a meta-theoretic property. (I will remind you that "there exists a unique proof of x" is not expressible in Metamath directly.)

Agreed, it is not expressible _in_ Metamath, but it uses only Metamath concepts, and does not introduce a concept that is foreign to Metamath, like context-free grammars.

[snip]
 
Basically, my proposal there was that a statement a b c ... is unambiguous if there exists a unique Metamath proof of IS_TOPLEVEL a b c ... .  And that proof _is_ then the parse tree of that statement.  (And a database is then unambiguous if all its statements are.)

With respect to the current proposal, I wouldn't like this approach because it requires you to reinterpret a typecode constant (first position) as a constant (second position), which I am trying to avoid. You could get around this by making "copies" of all the typecodes for usage in IS_TOPLEVEL: that is, map "a b c..." to "IS_TOPLEVEL <a> b c ..." and have "<wff>" be a new constant instead of reusing "wff".

To make sure I understand you correctly: you'd like to keep separate 'typecode constants', which are those which only appear in the first position of statements (specifically in $f 'syntactical typing' statements), and the other ('formula'?) constants which never appear in the first position.

I hadn't realized this goal until you now made it explicit.  And this 'separation of constants' could a sound principle.  (Basically instead of the variables/constants distinction, this would advocate a variables / 'formula constants' / 'type constants' distinction, which in theory even could be retrofitted into the Metamath spec without affecting set.mm, hol.mm, etc.  I think.)

However, I would argue that a big part of parsing is assigning syntactical types to formulas and sub-formulas.  So the fact that these two kinds of constant come together (only) in parsing rules is kind of expected.  Also, when parsing one expresses meta-statements, so the fact that one has IS_TOPLEVEL |- ... (which is a statement about |- ...) is a direct result of encoding that meta-statement as a Metamath statement.
 
You would also want to remove any syntax theorems like weq, because they would break the "unique proof of" property, as I mentioned above, unless you are careful about direct-from-axioms proofs.

I would argue (as I wrote in the earlier thread) that syntax proofs should only use $a and $f statements (and not weq $p).  That way we _force_ direct-from-axiom proofs.
 
But other than that, yes I think this is isomorphic to the approach with Syn, with one exception. Suppose that for each typecode c, there is a unique axiom $a IS_TOPLEVEL <c> v $. where v is a variable with type "t" (that is, defined with $f t v $.). Then we can define Syn(c) = t, and since syntax proofs act like CFGs, the two definitions will behave the same in this case.

The problem is what happens when the restrictions in the paper are not satisfied. You are guaranteed Syn : TC -> VT in any case with IS_TOPLEVEL (since the input is a variable, whose type must be a variable typecode, and the output is a <bracketed> typecode), but what if Syn is not a retraction (that is, Syn(c) = c where c e. VT)? Then the parser will try to parse sets like classes, wffs like sets,.... Who knows what could happen, but it won't be checking what you want, that's for sure. So I think we are safe to assume this.

The other possibility is that there are two statements $a IS_TOPLEVEL <c> v1 $. $a IS_TOPLEVEL <c> v2 $. where v1 and v2 have types t1, t2 respectively. If t1 = t2, then we immediately have a breakdown of unambiguity since the axioms are identical. If not, then it gets more interesting. Syn cannot be a function in this case, since we have Syn(c) = t1 and Syn(c) = t2 at the same time, but it might nevertheless be the case that there are unique proofs of "IS_TOPLEVEL<c> a b..." because for whatever reason, the expressions that can be validly parsed as t1 and as t2 are disjoint.

For example, in hol.mm there are two types of provable assertion, a typing assertion "|- A : al" and a deduction "|- R |= S". In fact, the type "wff" in hol.mm is basically the same as IS_TOPLEVEL, since it is only used to form parse trees for these two at the top level (neither ":" nor "|=" can appear in a term, not counting the ":" inside kl). There are no wff variables, so "wff" is actually a non-variable typecode.

With the $y proposal, we would have wff as a variable typecode, because the important distinction is not the presence of variables but its context-free behavior. It would be declared as $y var type term wff $. $y |- $= wff $. .

It's actually striking how similar smm's statements (https://github.com/sorear/smm-webui/blob/master/public/assets/set-config.mm) are to the current proposal:

$( $t
   -smm-syntactic-categories 'set' 'class' 'wff';
   -smm-logical-categories '|-' 'wff';
$)

vs.

$( $j
   syntax 'set' 'class' 'wff';
   syntax '|-' as 'wff';
$)


I don't fully understand this part. You seem to be saying that my IS_TOPLEVEL proposal with only direct-from-axiom proofs, and with typecode constants in non-first positions, will not work on some .mm files, part of which are realistic: those with weird (I would say incorrect) IS_TOPLEVEL parse rules, like duplicates; and those (like hol.mm) which already have a form of parsing built in (like typing assertions).

Is that what you are saying here?

Thanks!

Groetjes,
 <><
Marnix

--
Marnix Klooster
marnix....@gmail.com

Mario Carneiro

unread,
Mar 31, 2016, 4:02:27 PM3/31/16
to metamath
On Thu, Mar 31, 2016 at 3:18 PM, Marnix Klooster <marnix....@gmail.com> wrote:
To make sure I understand you correctly: you'd like to keep separate 'typecode constants', which are those which only appear in the first position of statements (specifically in $f 'syntactical typing' statements),

And also in the first position of $e $a $p statements (this includes "logical typecodes" / "non-variable typecoes" like |- ).
 
and the other ('formula'?) constants which never appear in the first position.

I hadn't realized this goal until you now made it explicit.  And this 'separation of constants' could a sound principle.  (Basically instead of the variables/constants distinction, this would advocate a variables / 'formula constants' / 'type constants' distinction, which in theory even could be retrofitted into the Metamath spec without affecting set.mm, hol.mm, etc.  I think.)

Yep, that's exactly right. And in Metamath 2.0 the $y / $c / $v distinction would make that quite clear.
 
However, I would argue that a big part of parsing is assigning syntactical types to formulas and sub-formulas.  So the fact that these two kinds of constant come together (only) in parsing rules is kind of expected.  Also, when parsing one expresses meta-statements, so the fact that one has IS_TOPLEVEL |- ... (which is a statement about |- ...) is a direct result of encoding that meta-statement as a Metamath statement.

In a sense, you get the same thing with "wff a b c..." as corresponding to "|- a b c...". Really the only difference is you don't need IS_TOPLEVEL; the mapping from |- to wff is recorded by $y or "syntax;".

You would also want to remove any syntax theorems like weq, because they would break the "unique proof of" property, as I mentioned above, unless you are careful about direct-from-axioms proofs.

I would argue (as I wrote in the earlier thread) that syntax proofs should only use $a and $f statements (and not weq $p).  That way we _force_ direct-from-axiom proofs.

This is the most convenient approach for theoretical work, but I don't want to completely shut the door on syntax theorems because it can actually lead to a savings in proof lengths. The best solution would be a parser that starts with a "pure parse" using only axioms, and then looks for subtrees to reduce using syntax theorems.
 
 
But other than that, yes I think this is isomorphic to the approach with Syn, with one exception. Suppose that for each typecode c, there is a unique axiom $a IS_TOPLEVEL <c> v $. where v is a variable with type "t" (that is, defined with $f t v $.). Then we can define Syn(c) = t, and since syntax proofs act like CFGs, the two definitions will behave the same in this case.

The problem is what happens when the restrictions in the paper are not satisfied. You are guaranteed Syn : TC -> VT in any case with IS_TOPLEVEL (since the input is a variable, whose type must be a variable typecode, and the output is a <bracketed> typecode), but what if Syn is not a retraction (that is, Syn(c) = c where c e. VT)? Then the parser will try to parse sets like classes, wffs like sets,.... Who knows what could happen, but it won't be checking what you want, that's for sure. So I think we are safe to assume this.

The other possibility is that there are two statements $a IS_TOPLEVEL <c> v1 $. $a IS_TOPLEVEL <c> v2 $. where v1 and v2 have types t1, t2 respectively. If t1 = t2, then we immediately have a breakdown of unambiguity since the axioms are identical. If not, then it gets more interesting. Syn cannot be a function in this case, since we have Syn(c) = t1 and Syn(c) = t2 at the same time, but it might nevertheless be the case that there are unique proofs of "IS_TOPLEVEL<c> a b..." because for whatever reason, the expressions that can be validly parsed as t1 and as t2 are disjoint.

For example, in hol.mm there are two types of provable assertion, a typing assertion "|- A : al" and a deduction "|- R |= S". In fact, the type "wff" in hol.mm is basically the same as IS_TOPLEVEL, since it is only used to form parse trees for these two at the top level (neither ":" nor "|=" can appear in a term, not counting the ":" inside kl). There are no wff variables, so "wff" is actually a non-variable typecode.

With the $y proposal, we would have wff as a variable typecode, because the important distinction is not the presence of variables but its context-free behavior. It would be declared as $y var type term wff $. $y |- $= wff $. .

It's actually striking how similar smm's statements (https://github.com/sorear/smm-webui/blob/master/public/assets/set-config.mm) are to the current proposal:

$( $t
   -smm-syntactic-categories 'set' 'class' 'wff';
   -smm-logical-categories '|-' 'wff';
$)

vs.

$( $j
   syntax 'set' 'class' 'wff';
   syntax '|-' as 'wff';
$)


I don't fully understand this part. You seem to be saying that my IS_TOPLEVEL proposal with only direct-from-axiom proofs, and with typecode constants in non-first positions, will not work on some .mm files, part of which are realistic: those with weird (I would say incorrect) IS_TOPLEVEL parse rules, like duplicates; and those (like hol.mm) which already have a form of parsing built in (like typing assertions).

The first part yes, the second part no. The fact that hol.mm has "typing assertions" in the logic is separate from our notion of syntax proof; since typing assertions are not context free, they have to be proven manually just like other |- assertions. What makes hol.mm interesting is the fact that it has a typecode "wff" which is not used at all in verification (never shows up in proofs), and has no variables, but is used to describe the syntax for the two possible provable assertions: a deduction "|- A |= B" or a typing assertion "|- A : al".

I had to make some guesses since you were not precise about what constraints exist on IS_TOPLEVEL axioms, which lead me to be overly conservative and consider some "weird" databases. Here are some possible varieties of weirdness:

1) Syn is not a retraction (Syn(t) = T =/= t)

$c IS_TOPLEVEL t T $. $v v V $.
vt $f t v $. VT $f T V $.
lt $a IS_TOPLEVEL t V $.
lT $a IS_TOPLEVEL T v $.


2) Syn is not a function (Syn(|-) is not well-defined)

$c IS_TOPLEVEL t T |- $. $v v V $.
vt $f t v $. VT $f T V $.
lt $a IS_TOPLEVEL |- V $.
lT $a IS_TOPLEVEL |- v $.

 
3) IS_TOPLEVEL is not a pure typecode function (the IS_TOPLEVEL axiom has more than just a typecode and variable)

$c IS_TOPLEVEL t T |- $. $v v V $.
vt $f t v $. VT $f T V $.
lt $a IS_TOPLEVEL |- v : V $.

If all three of the above cases are "wrong" by your definition, then the Syn function approach is the same as IS_TOPLEVEL, with only a notational difference.

In the third case, it can still be mapped to a conformant version, by introducing a new typecode:

$c IS_TOPLEVEL t T |- |-v $. $v v V p $.
vt $f t v $. VT $f T V $. vp $f |-v p $.
pt $a |-v v : V $.
lt $a IS_TOPLEVEL |- p $.


Mario
Reply all
Reply to author
Forward
0 new messages