I will go along with Mario's proposal, particularly since it has no impact on existing verifiers.
For the longer term, I have been wondering what might be the best future direction for the Metamath language in light of Mario's paper. What if we require, as part of the spec, that systems it encodes have the property of being unambiguous (as defined in the paper)? I don't really care that it be able to handle miu.mm or other bizarre systems that have no real-life applications and may be best handled by modeling within ZFC.
Is it possible to enhance or rewrite the spec in a simple way to accomplish this, using language similar to that of the existing spec that a novice programmer writing a Metamath verifier, or an intelligent layperson reading the spec, could understand?
If the only way to do this involves a significant spec complication, maybe it's best to leave the language proper alone, since its simplicity (reflected in the simplicity of the proof verifiers written so far) is an attractive feature for a lot of people.
But we can still plant a seed to start thinking about the long-term direction of the language, or a next generation replacement with something of similar simplicity, where the property of being grammatical/unambiguous is inherent (if we think that is important).
After the "Models for Metamath" paper (http://us2.metamath.org:88/ocat/model/model.pdf), it has become clear to me that the information provided by a standard Metamath file is insufficient to derive all the properties we want from grammatical parsing.
After the "Models for Metamath" paper (http://us2.metamath.org:88/ocat/model/model.pdf), it has become clear to me that the information provided by a standard Metamath file is insufficient to derive all the properties we want from grammatical parsing.I'm not sur I'm understanding the whole problem but I have always been a bit annoyed with all what is related to parsing in set.mm
The first time the issue came to the table, it was because O'Cat wanted to parse set.mm for mmj2.He needed a start symbol and hd other concerns like that. set.mm was modifiedand I think it was not a good thing. Because it was clearer before. It added considerations that are without signficancefor logic. I think we must be as close as possible to the mathematical definition of a logic. This is the strenght of metamathand all those considerations pollute the simplicity of the original design.
I also think that all those considerations can be solved outside set.mm either by designing mmj2 or other toolsin a way where their peculiar needs are solved by themselves or unformally by checking the problems of ambiguity and otherproblems at the human being level.
We must stick to the mathematical definition of a logic and add nothing else. The other proof checkers are boring because theywant to add a lot of additional features in a such a way nobody can understand them any longer.
Please, define logic.
--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To post to this group, send email to meta...@googlegroups.com.
Visit this group at https://groups.google.com/group/metamath.
For more options, visit https://groups.google.com/d/optout.
On 3/28/16 2:06 AM, Mario Carneiro wrote:
Here is a specific suggestion for adding to the spec:
Add a typecode statement, $y. For set.mm <http://set.mm>, it would look
like this:
$y set class wff $.
$y |- $= wff $.
There are two forms, $y c1 c2 ... cn $. and $y c1 $= c2 $. The first
form $y c1 c2 ... cn $. is shorthand for $y c1 $. $y c2 $. ... $y cn $.
(similar to $c statements), and $y c1 $. is shorthand for $y c1 $= c1 $.
The statement $y c1 $= c2 $. declares a typecode c1. If c1 = c2 then
this is a variable typecode, otherwise it is a non-variable typecode
(and c2 must be a previously declared variable typecode). Both c1 and c2
must be previously declared constants. [Another possibility is to have
this supersede $c statements, that is expand "$y c1 $= c2 $." to "$c c1
$. $y c1 $= c2 $." so that you don't have to declare it twice.]
Any $e $a $p statement must start with a declared typecode, and any $f
statement must start with a declared variable typecode. [Since I don't
want to require verifiers to do parsing, the information in a $y
statement can otherwise be ignored.]
We can consider this for a future Metamath 2.0 language (that may or may not be different in other respects too). It probably wouldn't be a good idea just to update the existing spec with this requirement since a new keyword will cause all existing verifiers to break. Your earlier proposal of a comment tagged with $j would probably be a better solution for the near term.
Ideally what I had in mind was was making non-grammatical .mm files illegal and adding additional requirements to existing, unmodified .mm files that ensure they are grammatical. (Would there ever be any use for a non-grammatical .mm?) This would allow any existing verifier to continue to work, with the understanding that they would do an incomplete verification w.r.t. the updated spec. We could have a test .mm file that would pass old verifiers but produce an error on new ones, to serve as a way to determine that an old verifier is incomplete. (metamath.exe might remain in the incomplete category for some time, depending on the difficulty of updating it.)
The set of variable typecodes can be collected from $f statements. The type for the Syn function would be identified by the unique typecode that doesn't appear in a $f statement. What it maps to is still a problem, so it appears we would have to add something to existing .mm files. However, we can add the rather weird statement
syn $a |- wff $.
to set.mm without violating the existing spec, and existing verifiers should all accept this without error. "syn" is an arbitrary (not hard-coded) label. In updated spec, the syn statement would be identified by the unique $a statement having a typecode as the second token, and otherwise typecodes would be allowed only as the first token in any statement. I'm not sure what problems could be caused by someone referencing this $a in a proof, but if it is a problem the new spec could prohibit it.
This is somewhat of a hack, but it would meet the new requirement and allow 100% backwards compatibility, unless I'm overlooking something.
Another way to achieve this without the syn statement is to require that the typecode for the first $f be the Syn value. Then we probably wouldn't have to touch any .mm files. Would this be a better solution?
For a Metamath 2.0 that starts from scratch, one idea I've considered of and on in the past is to turn the internal "language" of the Metamath Solitaire applet into an actual language. It might have even been the Metamath language if I had written the applet first.
All expressions would be Polish notation in the language itself and mapped to human-friendly form for display. Polish notation is the big obstacle I see for editing the source file directly and probably the main reason I haven't pursued this approach. People have mentioned that the S-expressions in Ghilbert make it hard to read, and Polish is worse. (Actually for efficiency, reverse Polish - RPN - might be computationally more efficient since it requires only a forward scan, but that's even harder to read.)
A way to address Polish readability and ease of editing is to do everything via a special editor, in the same way that we now work with proofs indirectly and rarely edit them by hand.
Finally, I think that parsing and unambiguity defined in Metamath terms ('there exists a unique Metamath proof of...') is better than one based on external concepts like context-free parsers (as is done in the 'Models for Metamath' paper, if I see correctly).
Basically, my proposal there was that a statement a b c ... is unambiguous if there exists a unique Metamath proof of IS_TOPLEVEL a b c ... . And that proof _is_ then the parse tree of that statement. (And a database is then unambiguous if all its statements are.)
Why do you want all typecodes declared? Most can (and should) be inferred from
the existence of $f declarations. Why isn't it sufficient to declare the
relationship between |- and wff, and maybe put some hints to what parser
should be used.
On the latter topic, I propose to have "unambiguous" and "parse-lalr" both
put in set.mm (inside some $j), the former making a promise about the
database, the latter giving a "proof" (checkable by a lalr-parser
generator of your choice), or ignored by a tool which does not know
about a "lalr"-parsing algorithm. If all keywords of the second kind are
prefixed with "parse-", we can be rather liberal with naming our newest
algorithms (like your lalr++ grammar class, you discussed lately).
I think if something does not concern a verifier, it belongs between $( and $).
A Metamath theorem can be 100% verified, without $y. Hence it should not
be a full statement (which _at least_ increases parser complexity for
the Metamath language).
(Would there ever be any use for a non-grammatical .mm?)
--
Norm
> This $j comment need notbe read by verifiers, but is useful for parsers like mmj2. I can't see how it can be used by parsers. Because parsers have this pecularity: if it can be parsed, they parse, if it can't be parsed they don't parse. |
On Tue, Mar 29, 2016 at 1:10 AM, Marnix Klooster <marnix....@gmail.com> wrote:Finally, I think that parsing and unambiguity defined in Metamath terms ('there exists a unique Metamath proof of...') is better than one based on external concepts like context-free parsers (as is done in the 'Models for Metamath' paper, if I see correctly).Sorry about that. This is an artifact of my intended readership, which are regular mathematicians - so I attempt to deconstruct everything Metamath and put it in informal math terms.For a Metamath-ematician, of course this is less than ideal. Nevertheless, there are many other ways to express the constraint, although none that use exclusively Metamath-provable notions, because it is a meta-theoretic property. (I will remind you that "there exists a unique proof of x" is not expressible in Metamath directly.)
Basically, my proposal there was that a statement a b c ... is unambiguous if there exists a unique Metamath proof of IS_TOPLEVEL a b c ... . And that proof _is_ then the parse tree of that statement. (And a database is then unambiguous if all its statements are.)With respect to the current proposal, I wouldn't like this approach because it requires you to reinterpret a typecode constant (first position) as a constant (second position), which I am trying to avoid. You could get around this by making "copies" of all the typecodes for usage in IS_TOPLEVEL: that is, map "a b c..." to "IS_TOPLEVEL <a> b c ..." and have "<wff>" be a new constant instead of reusing "wff".
You would also want to remove any syntax theorems like weq, because they would break the "unique proof of" property, as I mentioned above, unless you are careful about direct-from-axioms proofs.
But other than that, yes I think this is isomorphic to the approach with Syn, with one exception. Suppose that for each typecode c, there is a unique axiom $a IS_TOPLEVEL <c> v $. where v is a variable with type "t" (that is, defined with $f t v $.). Then we can define Syn(c) = t, and since syntax proofs act like CFGs, the two definitions will behave the same in this case.The problem is what happens when the restrictions in the paper are not satisfied. You are guaranteed Syn : TC -> VT in any case with IS_TOPLEVEL (since the input is a variable, whose type must be a variable typecode, and the output is a <bracketed> typecode), but what if Syn is not a retraction (that is, Syn(c) = c where c e. VT)? Then the parser will try to parse sets like classes, wffs like sets,.... Who knows what could happen, but it won't be checking what you want, that's for sure. So I think we are safe to assume this.The other possibility is that there are two statements $a IS_TOPLEVEL <c> v1 $. $a IS_TOPLEVEL <c> v2 $. where v1 and v2 have types t1, t2 respectively. If t1 = t2, then we immediately have a breakdown of unambiguity since the axioms are identical. If not, then it gets more interesting. Syn cannot be a function in this case, since we have Syn(c) = t1 and Syn(c) = t2 at the same time, but it might nevertheless be the case that there are unique proofs of "IS_TOPLEVEL<c> a b..." because for whatever reason, the expressions that can be validly parsed as t1 and as t2 are disjoint.
For example, in hol.mm there are two types of provable assertion, a typing assertion "|- A : al" and a deduction "|- R |= S". In fact, the type "wff" in hol.mm is basically the same as IS_TOPLEVEL, since it is only used to form parse trees for these two at the top level (neither ":" nor "|=" can appear in a term, not counting the ":" inside kl). There are no wff variables, so "wff" is actually a non-variable typecode.With the $y proposal, we would have wff as a variable typecode, because the important distinction is not the presence of variables but its context-free behavior. It would be declared as $y var type term wff $. $y |- $= wff $. .It's actually striking how similar smm's statements (https://github.com/sorear/smm-webui/blob/master/public/assets/set-config.mm) are to the current proposal:
$( $t
-smm-syntactic-categories 'set' 'class' 'wff';
-smm-logical-categories '|-' 'wff';
$)vs.$( $jsyntax 'set' 'class' 'wff';
syntax '|-' as 'wff';
$)
To make sure I understand you correctly: you'd like to keep separate 'typecode constants', which are those which only appear in the first position of statements (specifically in $f 'syntactical typing' statements),
and the other ('formula'?) constants which never appear in the first position.I hadn't realized this goal until you now made it explicit. And this 'separation of constants' could a sound principle. (Basically instead of the variables/constants distinction, this would advocate a variables / 'formula constants' / 'type constants' distinction, which in theory even could be retrofitted into the Metamath spec without affecting set.mm, hol.mm, etc. I think.)
However, I would argue that a big part of parsing is assigning syntactical types to formulas and sub-formulas. So the fact that these two kinds of constant come together (only) in parsing rules is kind of expected. Also, when parsing one expresses meta-statements, so the fact that one has IS_TOPLEVEL |- ... (which is a statement about |- ...) is a direct result of encoding that meta-statement as a Metamath statement.
You would also want to remove any syntax theorems like weq, because they would break the "unique proof of" property, as I mentioned above, unless you are careful about direct-from-axioms proofs.I would argue (as I wrote in the earlier thread) that syntax proofs should only use $a and $f statements (and not weq $p). That way we _force_ direct-from-axiom proofs.
But other than that, yes I think this is isomorphic to the approach with Syn, with one exception. Suppose that for each typecode c, there is a unique axiom $a IS_TOPLEVEL <c> v $. where v is a variable with type "t" (that is, defined with $f t v $.). Then we can define Syn(c) = t, and since syntax proofs act like CFGs, the two definitions will behave the same in this case.The problem is what happens when the restrictions in the paper are not satisfied. You are guaranteed Syn : TC -> VT in any case with IS_TOPLEVEL (since the input is a variable, whose type must be a variable typecode, and the output is a <bracketed> typecode), but what if Syn is not a retraction (that is, Syn(c) = c where c e. VT)? Then the parser will try to parse sets like classes, wffs like sets,.... Who knows what could happen, but it won't be checking what you want, that's for sure. So I think we are safe to assume this.The other possibility is that there are two statements $a IS_TOPLEVEL <c> v1 $. $a IS_TOPLEVEL <c> v2 $. where v1 and v2 have types t1, t2 respectively. If t1 = t2, then we immediately have a breakdown of unambiguity since the axioms are identical. If not, then it gets more interesting. Syn cannot be a function in this case, since we have Syn(c) = t1 and Syn(c) = t2 at the same time, but it might nevertheless be the case that there are unique proofs of "IS_TOPLEVEL<c> a b..." because for whatever reason, the expressions that can be validly parsed as t1 and as t2 are disjoint.
For example, in hol.mm there are two types of provable assertion, a typing assertion "|- A : al" and a deduction "|- R |= S". In fact, the type "wff" in hol.mm is basically the same as IS_TOPLEVEL, since it is only used to form parse trees for these two at the top level (neither ":" nor "|=" can appear in a term, not counting the ":" inside kl). There are no wff variables, so "wff" is actually a non-variable typecode.With the $y proposal, we would have wff as a variable typecode, because the important distinction is not the presence of variables but its context-free behavior. It would be declared as $y var type term wff $. $y |- $= wff $. .It's actually striking how similar smm's statements (https://github.com/sorear/smm-webui/blob/master/public/assets/set-config.mm) are to the current proposal:
$( $t
-smm-syntactic-categories 'set' 'class' 'wff';
-smm-logical-categories '|-' 'wff';
$)vs.$( $jsyntax 'set' 'class' 'wff';
syntax '|-' as 'wff';
$)I don't fully understand this part. You seem to be saying that my IS_TOPLEVEL proposal with only direct-from-axiom proofs, and with typecode constants in non-first positions, will not work on some .mm files, part of which are realistic: those with weird (I would say incorrect) IS_TOPLEVEL parse rules, like duplicates; and those (like hol.mm) which already have a form of parsing built in (like typing assertions).