On Tue, Sep 01, 2015 at 10:39:02PM +0200, Marnix Klooster wrote:
> Hi all,
>
> I quickly want to float an idea around checking that a Metamath database's
> grammar has no ambiguity. I'm curious whether to you all this is helpful,
> or obvious, or enlightening, or confusing, or irrelevant.
>
> *
>
> For my own future reference, if I understood correctly the issue of
> ambiguity was raised by Mel O'Cat in "Unification/Proof Assistant Problem
> #2" found in the mmj2 distribution (see, e.g.,
>
https://github.com/digama0/mmj2/blob/master/doc/UnificationProblem2.txt or
>
http://us.metamath.org/ocat/mmj2/doc/UnificationProblem2.txt). In essence,
> an ambiguous grammar forces a proof assistant to let the user influence
> syntax decisions: "Do you want this x to represent a set or a class?" The
> result (I think) was the change of weq from an axiom to a theorem: see
>
http://us.metamath.org/mpeuni/weq.html.
No. If the user can influence syntax decisions, that is fine. What is not
fine is that with an ambiguous grammar, it is possible that there is *no*
globally consistent assignment of "derivations" to terms.
The e. issue was, strictly speaking, fine, as the two "senses" of e. were
equivalent (theorem cleljust). There was a syntactic ambiguity, but no logical
inconsistency. However, it still creates a problem for tooling.
> This topic was also discussed in this group in 2010; see
>
https://groups.google.com/forum/#!topic/metamath/iunsTW6fGic. In that
> thread, Alexander Smirnov wrote, "A proof of the absence of syntax
> ambiguity clearly should be a part of any proof of the soundness of
set.mm."
> I'm not sure I understand why, but it does seem sensible to have an
> unambiguous syntax.
If there were no parentheses, then you could prove:
tru T.
wtru wtru wfal wa orci T. \/ T. /\ F.
wtru wtru wa wfal simpri F.
(I've put the $f hyps after the $es for clarity here, so this isn't quiiite a
valid
set.mm proof string).
Why? Because the grammatical tree hasn't stayed constant. Because Metamath
follows the classical approach of treating formal logic as a game of strings,
rather than the more frequent now approach of a game of trees, it's vulnerable
to inconsistencies created by grammar ambiguity.
https://en.wikipedia.org/wiki/Well-formed_formula#Usage_of_the_terminology
is the best discussion I've found of the shift, although it's way too short.
(When we formalize FOL in
set.mm, I have a strong preference for the tree game,
but this is primarily because it allows infinitary logic to be built on the
same foundation; however Metamath databases are presently limited to finite
formulas so this does not matter for the meta level.)
> *any* Metamath database) has an unambiguous grammar, as follows. I'll use
>
set.mm here, but this should work for any database.
First, because metamath does not work at the grammatical level, it is not
coherent to talk about the grammar of an unconstrained metamath database.
miu.mm in particular is not based on a grammar. (You can run parsers over it
if you want, but this is strictly GIGO; Hofstadter did not describe MIU with a
grammar, so it doesn't have one).
mmj2 thinks partially at the level of grammars, so it's not usable on
miu.mm.
smm currently has optional support for grammars; the smm proof assistant, when
it exists, is likely to be grammar-mode-only because I don't like the behavior
of string unification much.
Ambiguity of context-free grammars is equivalent to the halting problem.
(There is a fairly simple reduction to the
https://en.wikipedia.org/wiki/Post_correspondence_problem , and a much more
complicated reduction to halting in the linked article.)
Your proposal checks a finite number of strings for multiple derivations, while
to "prove" unambiguity by that method it would be necessary to attempt to parse
all *aleph-null* strings of tokens to verify that each of them has at most one
parse.
This, however, is an unnecessary nitpick. It is no more necessary to prove
that an ambiguity cannot be derived from the
set.mm syntax rules than it is to
prove that a contradiction cannot be derived from the
set.mm logical axioms.
What is important is that an _undetected_ ambiguity cannot exist - it is fine
for there to be potentially ambiguous strings as long as all strings that exist
in the database are checked for single parses by the verifier.
> is unambiguous, we simply verify that there is *exactly one* proof of
> IS_TOPLEVEL
> a b c ... from all earlier active $a and $f statements.
This is equivalent to first replacing |- with wff, then checking for exactly
one proof.
> As a random example, let's take
>
> euorv $p |- ( ( -. ph /\ E! x ps ) -> E! x ( ph \/ ps ) ) $= ? $.
>
> The only (I assume) proof of IS_TOPLEVEL |- ( ( -. ph /\ E! x ps ) -> E! x
> ( ph \/ ps ) ) is
>
> wph wn wps vx weu wa wph wps wo vx weu wi turnstile.TOP
>
> which we could render as an S-expression as
>
> (turnstile.TOP (wi (wa (wn wph) (weu wps vx)) (weu (wo wph wps) vx)))
>
> which we could render graphically as
>
> (|- (-> (/\ (-. ph) (E! ps x)) (E! (\/ ph ps) x)))
>
> So this strongly suggests that the unique proof of IS_TOPLEVEL a b c ...
> *is* the parse tree of a b c ....
Yes, this is how Mario and I interpret parse trees in. I'm not sold on IS_TOPLEVEL; though
what smm is doing currently (
https://github.com/sorear/smm-webui/blob/master/public/assets/set-config.mm )
is not really better.
> In essence, this is just like the 'detailed syntax breakdown' that the
> Metamath program generates in the HTML pages for $a statements. But now
> using a database-independent mechanism, based a few database-specific
> IS_TOPLEVEL axioms.
Curious fact: metamath.exe uses the IMPROVE ALL code to implement said syntax
breakdowns, and can thus handle some types of non-context-free grammars, while
smm and mmj2 use more restricted mechanisms (Packrat parsers and Earley
parsers).
> I don't think I'll soon have the time to actually check that all of current
>
set.mm is unambiguous by the above definition, but this should be a simple
> thing to do using, e.g., smm. And the Metamath program itself could
> perform this check; perhaps with a built-in additional math constant $^
> instead of a convention like IS_TOPLEVEL?
The smm verifier already does this:
smm (master %)]$ cp .../
set.mm/set.mm .
smm (master %)]$ cat ../smm-webui/public/assets/
set-config.mm >>
set.mm
smm (master %)]$ sed -i '' -E -e 's/$/ /; s/ ([][])_ / \1 /g; s/ ([][])_ / \1 /g; s/ $//'
set.mm
smm (master %)]$ babel-node misc/verify.js
set.mm | head -n 40
parse 865 ms
set.mm:33603:6:error: A math symbol may not be redeclared as a constant
| $c »[« $. $( Underlined left bracket $)
set.mm:23050:6:info: Previous definition
| $c »[« $. $( Left bracket $)
set.mm:33604:6:error: A math symbol may not be redeclared as a constant
| $c »]« $. $( Underlined right bracket $)
set.mm:23052:6:info: Previous definition
| $c »]« $. $( Right bracket $)
scoper 382 ms
set.mm:265577:5:warning: This token not used in the file; typesetting will be ignored
| »"[_"« as "<IMG SRC='_ulbrack.gif' WIDTH=6 HEIGHT=19 TITLE='[_' ALIGN=TOP>";
set.mm:265578:14:warning: This token not used in the file; typesetting will be ignored
| althtmldef »"[_"« as '<U>[</U>'; /* [ */
set.mm:265579:12:warning: This token not used in the file; typesetting will be ignored
| latexdef »"[_"« as "\underline{[}";
set.mm:265581:5:warning: This token not used in the file; typesetting will be ignored
| »"]_"« as "<IMG SRC='_urbrack.gif' WIDTH=5 HEIGHT=19 TITLE=']_' ALIGN=TOP>";
set.mm:265582:14:warning: This token not used in the file; typesetting will be ignored
| althtmldef »"]_"« as '<U>]</U>'; /* ] */
set.mm:265583:12:warning: This token not used in the file; typesetting will be ignored
| latexdef »"]_"« as "\underline{]}";
metadata 53 ms
set.mm:33618:18:error: (in df-csb) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| df-csb $a |- »[« A / x ] B = { y | [ A / x ] y e. B } $.
set.mm:33627:16:error: (in csb2) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csb2 $p |- »[« A / x ] B = { y | E. x ( x = A /\ y e. B ) } $=
set.mm:33636:29:error: (in csbeq1) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csbeq1 $p |- ( A = B -> »[« A / x ] C = [ B / x ] C ) $=
set.mm:33647:31:error: (in cbvcsbv) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| cbvcsbv $p |- ( A e. V -> »[« A / x ] B = [ A / y ] C ) $=
set.mm:33656:27:error: (in csbeq1d) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csbeq1d $p |- ( ph -> »[« A / x ] C = [ B / x ] C ) $=
set.mm:33664:17:error: (in csbid) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csbid $p |- »[« x / x ] A = A $=
set.mm:33681:30:error: (in csbcog) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csbcog $p |- ( A e. V -> »[« A / y ] [ y / x ] B = [ A / x ] B ) $=
set.mm:33691:49:error: (in csbexg) Grammatical ambiguity detected; from here can be parsed wcel or wsbc
| csbexg $p |- ( ( A e. V /\ A. x B e. W ) -> »[« A / x ] B e. _V ) $=
set.mm:33704:17:error: (in csbex) Grammatical ambiguity detected; from here can be parsed wcel or wsbc
(note that csbeq1a is *not* flagged; its statement remains unambiguous even in
the modified grammar.)
However, it's not quite enough, because to actually catch the cases that can
lead to logical inconsistency (like my example above), you also need to make
sure that nothing changes intepretation in the *middle of a proof*.
Equivalently, it's possible to change the grammar to an inherently unambiguous
form (such as Polish notation or parse trees) and then perform verification in
that scheme. I intend to explore the use of parse tree based verification when
I next revisit smm (adding the ability to inspect and/or create proofs, rather
than just verifying them).
I suspect that for a seriously optimized verifier in a strongly typed
imperative language, parse trees + hash consing may be the fastest way to go
for verification, in addition to automatically preventing intermediate steps
from actualizing an ambiguity; things are less clear with Javascript because
native strings outperform most of what you can throw together at the user level
for reasonable parameter ranges.
That is to say, I beleive that using a tree-based verifier, together with
checking unique readibility for all $a, $p, and $e statements, is sufficient to
prevent any real inconsistency from being created by an ambiguous grammar.
This is a marked strategic difference from mmj2, which Mario has in the past
attempted to add an unambiguity-prover to. (Such a prover would necessarily be
incomplete, but problems that are undecidable in general quite often have easy
instances.)
This is certainly a weakness in the Metamath system. Declaring definitional
extensions to be outside of the system simplifies "the system", but only by
gerrymandering it; if you want to compare Metamath to an LCF kernel, you should
compare said LCF kernel to the combined system of the proof verifier, a unique
derivation checker, and a definitional soundness checker. I have some vague
ideas for an LCF-style verifier in smm which does all derivations and
justification theorems "outside" and then passes a collection of proof trees to
a kernel implementing something much closer to the "Metamath formal system".
-sorear