Automated Metamath ambiguity checking

86 views

Skip to first unread message

Marnix Klooster

unread,

Sep 1, 2015, 4:39:22 PM9/1/15

to Metamath

Hi all,

I quickly want to float an idea around checking that a Metamath database's grammar has no ambiguity. I'm curious whether to you all this is helpful, or obvious, or enlightening, or confusing, or irrelevant.

For my own future reference, if I understood correctly the issue of ambiguity was raised by Mel O'Cat in "Unification/Proof Assistant Problem #2" found in the mmj2 distribution (see, e.g., https://github.com/digama0/mmj2/blob/master/doc/UnificationProblem2.txt or http://us.metamath.org/ocat/mmj2/doc/UnificationProblem2.txt). In essence, an ambiguous grammar forces a proof assistant to let the user influence syntax decisions: "Do you want this x to represent a set or a class?" The result (I think) was the change of weq from an axiom to a theorem: see http://us.metamath.org/mpeuni/weq.html.

This topic was also discussed in this group in 2010; see https://groups.google.com/forum/#!topic/metamath/iunsTW6fGic. In that thread, Alexander Smirnov wrote, "A proof of the absence of syntax ambiguity clearly should be a part of any proof of the soundness of set.mm." I'm not sure I understand why, but it does seem sensible to have an unambiguous syntax.

Then this year there was Mario Carneiro's informal proof that set.mm is unambiguous: see his "Grammar ambiguity in set.mm" at http://us.metamath.org/downloads/grammar-ambiguity.txt.

Now, if I'm right, there is a simple automated way to check that set.mm (or any Metamath database) has an unambiguous grammar, as follows. I'll use set.mm here, but this should work for any database.

First, we add axioms about a new constant IS_TOPLEVEL in the appropriate places:

$( ` IS_TOPLEVEL a b c ... ` roughly means " ` a b c ... ` is a syntactically
   valid top-level expression". $)
$c IS_TOPLEVEL $.

$( [snip] $)

$c wff $.
$c |- $.
${
    $v ph $.
    wff.TOP.1 $f wff ph $.
    wff.TOP $a IS_TOPLEVEL wff ph $.
    turnstile.TOP $a IS_TOPLEVEL |- ph $.
$}

$( [snip] $)

$c set $.
${
    $v x $.
    set.TOP.1 $f set x $.
    set.TOP $a IS_TOPLEVEL set x $.
$}

$( [snip] $)

$c class $.
${
    $v A $.
    class.TOP.1 $f class A $.
    class.TOP $a IS_TOPLEVEL class A $.
$}

$( [snip] $)

And now, to verify that a $a, $p, or $e statement with math string a b c ... is unambiguous, we simply verify that there is exactly one proof of IS_TOPLEVEL a b c ... from all earlier active $a and $f statements.

As a random example, let's take

euorv $p |- ( ( -. ph /\ E! x ps ) -> E! x ( ph \/ ps ) ) $= ? $.

The only (I assume) proof of IS_TOPLEVEL |- ( ( -. ph /\ E! x ps ) -> E! x ( ph \/ ps ) ) is

wph wn wps vx weu wa wph wps wo vx weu wi turnstile.TOP

which we could render as an S-expression as

(turnstile.TOP (wi (wa (wn wph) (weu wps vx)) (weu (wo wph wps) vx)))

which we could render graphically as

(|- (-> (/\ (-. ph) (E! ps x)) (E! (\/ ph ps) x)))

So this strongly suggests that the unique proof of IS_TOPLEVEL a b c ... is the parse tree of a b c ....

In essence, this is just like the 'detailed syntax breakdown' that the Metamath program generates in the HTML pages for $a statements. But now using a database-independent mechanism, based a few database-specific IS_TOPLEVEL axioms.

I don't think I'll soon have the time to actually check that all of current set.mm is unambiguous by the above definition, but this should be a simple thing to do using, e.g., smm. And the Metamath program itself could perform this check; perhaps with a built-in additional math constant $^ instead of a convention like IS_TOPLEVEL?

Comments?

Groetjes,
<><

Marnix

--
Marnix Klooster
marnix....@gmail.com

Mario Carneiro

unread,

Sep 1, 2015, 10:24:19 PM9/1/15

to metamath

I understand that checking that a general CFG is unambiguous is an undecidable problem (https://en.wikipedia.org/wiki/Ambiguous_grammar#Recognizing_ambiguous_grammars). If you pushed the syntax one level further, you could model the proofs of a given string in a Metamath-in-Metamath approach, and under such a framework you could do proofs that a given database of syntax axioms is unambiguous - but it would be a creative proof, one that needs human intervention and would not be completely automatable.

To be clear, I am referring to a scheme like:

df-wordturnstile $a |- "|-" = 1

df-wordwff $a |- "wff" = 2

...
df-grammar $a |- Grammar = ( 1 ... ; 2 0 ) <-- or however many symbols there are

unambiguous $p |- ( W e. ( Word ` Grammar ) -> E* p (p is an RPN string deriving W) )

This would allow you to model any Metamath grammar in the language of set.mm. Under such a structure one could formalize grammar-ambiguity.txt and show that set.mm specifically has an unambiguous grammar. But the proof in grammar-ambiguity.txt is most certainly not general; it only works for set.mm and minor variations of it. Since that writeup was completed I have treated it as a set of rules for future syntax - every new piece of syntax must fall into one of those categories, and must not break any of the intermediate theorems. It is this part that can be mechanized and potentially put into a proof assistant, although it is clearly set.mm-centric.

I believe that the proof, which was written for a year-old version of set.mm, still applies to set.mm, as well as nf.mm and iset.mm, which are based on set.mm. ql.mm and hol.mm have significantly different syntax, so the proof does not apply directly, but after identifying what takes the place of "class" and "wff", one can use a very similar approach to show unambiguity there as well.

Mario

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To post to this group, send email to meta...@googlegroups.com.
Visit this group at http://groups.google.com/group/metamath.
For more options, visit https://groups.google.com/d/optout.

Stefan O'Rear

unread,

Sep 2, 2015, 2:10:52 AM9/2/15

to meta...@googlegroups.com

On Tue, Sep 01, 2015 at 10:39:02PM +0200, Marnix Klooster wrote:
> Hi all,
>
> I quickly want to float an idea around checking that a Metamath database's
> grammar has no ambiguity. I'm curious whether to you all this is helpful,
> or obvious, or enlightening, or confusing, or irrelevant.
>
> *
>
> For my own future reference, if I understood correctly the issue of
> ambiguity was raised by Mel O'Cat in "Unification/Proof Assistant Problem
> #2" found in the mmj2 distribution (see, e.g.,
> https://github.com/digama0/mmj2/blob/master/doc/UnificationProblem2.txt or
> http://us.metamath.org/ocat/mmj2/doc/UnificationProblem2.txt). In essence,
> an ambiguous grammar forces a proof assistant to let the user influence
> syntax decisions: "Do you want this x to represent a set or a class?" The
> result (I think) was the change of weq from an axiom to a theorem: see
> http://us.metamath.org/mpeuni/weq.html.

No. If the user can influence syntax decisions, that is fine. What is not
fine is that with an ambiguous grammar, it is possible that there is *no*
globally consistent assignment of "derivations" to terms.

The e. issue was, strictly speaking, fine, as the two "senses" of e. were
equivalent (theorem cleljust). There was a syntactic ambiguity, but no logical
inconsistency. However, it still creates a problem for tooling.

> This topic was also discussed in this group in 2010; see
> https://groups.google.com/forum/#!topic/metamath/iunsTW6fGic. In that
> thread, Alexander Smirnov wrote, "A proof of the absence of syntax
> ambiguity clearly should be a part of any proof of the soundness of set.mm."
> I'm not sure I understand why, but it does seem sensible to have an
> unambiguous syntax.

If there were no parentheses, then you could prove:

tru T.
wtru wtru wfal wa orci T. \/ T. /\ F.
wtru wtru wa wfal simpri F.

(I've put the $f hyps after the $es for clarity here, so this isn't quiiite a
valid set.mm proof string).

Why? Because the grammatical tree hasn't stayed constant. Because Metamath
follows the classical approach of treating formal logic as a game of strings,
rather than the more frequent now approach of a game of trees, it's vulnerable
to inconsistencies created by grammar ambiguity.

https://en.wikipedia.org/wiki/Well-formed_formula#Usage_of_the_terminology
is the best discussion I've found of the shift, although it's way too short.

(When we formalize FOL in set.mm, I have a strong preference for the tree game,
but this is primarily because it allows infinitary logic to be built on the
same foundation; however Metamath databases are presently limited to finite
formulas so this does not matter for the meta level.)

> Then this year there was Mario Carneiro's informal proof that set.mm is
> unambiguous: see his "Grammar ambiguity in set.mm" at
> http://us.metamath.org/downloads/grammar-ambiguity.txt.
>
> *
>
> Now, if I'm right, there is a simple automated way to check that set.mm (or

> *any* Metamath database) has an unambiguous grammar, as follows. I'll use

> set.mm here, but this should work for any database.

First, because metamath does not work at the grammatical level, it is not
coherent to talk about the grammar of an unconstrained metamath database.
miu.mm in particular is not based on a grammar. (You can run parsers over it
if you want, but this is strictly GIGO; Hofstadter did not describe MIU with a
grammar, so it doesn't have one).

mmj2 thinks partially at the level of grammars, so it's not usable on miu.mm.
smm currently has optional support for grammars; the smm proof assistant, when
it exists, is likely to be grammar-mode-only because I don't like the behavior
of string unification much.

Ambiguity of context-free grammars is equivalent to the halting problem.
(There is a fairly simple reduction to the
https://en.wikipedia.org/wiki/Post_correspondence_problem , and a much more
complicated reduction to halting in the linked article.)

Your proposal checks a finite number of strings for multiple derivations, while
to "prove" unambiguity by that method it would be necessary to attempt to parse
all *aleph-null* strings of tokens to verify that each of them has at most one
parse.

This, however, is an unnecessary nitpick. It is no more necessary to prove
that an ambiguity cannot be derived from the set.mm syntax rules than it is to
prove that a contradiction cannot be derived from the set.mm logical axioms.
What is important is that an _undetected_ ambiguity cannot exist - it is fine
for there to be potentially ambiguous strings as long as all strings that exist
in the database are checked for single parses by the verifier.

> is unambiguous, we simply verify that there is *exactly one* proof of

> IS_TOPLEVEL
> a b c ... from all earlier active $a and $f statements.

This is equivalent to first replacing |- with wff, then checking for exactly
one proof.

> As a random example, let's take
>
> euorv $p |- ( ( -. ph /\ E! x ps ) -> E! x ( ph \/ ps ) ) $= ? $.
>
> The only (I assume) proof of IS_TOPLEVEL |- ( ( -. ph /\ E! x ps ) -> E! x
> ( ph \/ ps ) ) is
>
> wph wn wps vx weu wa wph wps wo vx weu wi turnstile.TOP
>
> which we could render as an S-expression as
>
> (turnstile.TOP (wi (wa (wn wph) (weu wps vx)) (weu (wo wph wps) vx)))
>
> which we could render graphically as
>
> (|- (-> (/\ (-. ph) (E! ps x)) (E! (\/ ph ps) x)))
>
> So this strongly suggests that the unique proof of IS_TOPLEVEL a b c ...

> *is* the parse tree of a b c ....

Yes, this is how Mario and I interpret parse trees in. I'm not sold on IS_TOPLEVEL; though
what smm is doing currently (
https://github.com/sorear/smm-webui/blob/master/public/assets/set-config.mm )
is not really better.

> In essence, this is just like the 'detailed syntax breakdown' that the
> Metamath program generates in the HTML pages for $a statements. But now
> using a database-independent mechanism, based a few database-specific
> IS_TOPLEVEL axioms.

Curious fact: metamath.exe uses the IMPROVE ALL code to implement said syntax
breakdowns, and can thus handle some types of non-context-free grammars, while
smm and mmj2 use more restricted mechanisms (Packrat parsers and Earley
parsers).

> I don't think I'll soon have the time to actually check that all of current
> set.mm is unambiguous by the above definition, but this should be a simple
> thing to do using, e.g., smm. And the Metamath program itself could
> perform this check; perhaps with a built-in additional math constant $^
> instead of a convention like IS_TOPLEVEL?

The smm verifier already does this:

smm (master %)]$ cp .../set.mm/set.mm .
smm (master %)]$ cat ../smm-webui/public/assets/set-config.mm >> set.mm
smm (master %)]$ sed -i '' -E -e 's/$/ /; s/ ([][])_ / \1 /g; s/ ([][])_ / \1 /g; s/ $//' set.mm
smm (master %)]$ babel-node misc/verify.js set.mm | head -n 40
parse 865 ms
set.mm:33603:6:error: A math symbol may not be redeclared as a constant
| $c »[« $. $( Underlined left bracket $)
set.mm:23050:6:info: Previous definition
| $c »[« $. $( Left bracket $)
set.mm:33604:6:error: A math symbol may not be redeclared as a constant
| $c »]« $. $( Underlined right bracket $)
set.mm:23052:6:info: Previous definition
| $c »]« $. $( Right bracket $)
scoper 382 ms
set.mm:265577:5:warning: This token not used in the file; typesetting will be ignored
| »"[_"« as "<IMG SRC='_ulbrack.gif' WIDTH=6 HEIGHT=19 TITLE='[_' ALIGN=TOP>";
set.mm:265578:14:warning: This token not used in the file; typesetting will be ignored
| althtmldef »"[_"« as '<U>[</U>'; /* [ */
set.mm:265579:12:warning: This token not used in the file; typesetting will be ignored
| latexdef »"[_"« as "\underline{[}";
set.mm:265581:5:warning: This token not used in the file; typesetting will be ignored
| »"]_"« as "<IMG SRC='_urbrack.gif' WIDTH=5 HEIGHT=19 TITLE=']_' ALIGN=TOP>";
set.mm:265582:14:warning: This token not used in the file; typesetting will be ignored
| althtmldef »"]_"« as '<U>]</U>'; /* ] */
set.mm:265583:12:warning: This token not used in the file; typesetting will be ignored
| latexdef »"]_"« as "\underline{]}";
metadata 53 ms
set.mm:33618:18:error: (in df-csb) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| df-csb $a |- »[« A / x ] B = { y | [ A / x ] y e. B } $.
set.mm:33627:16:error: (in csb2) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csb2 $p |- »[« A / x ] B = { y | E. x ( x = A /\ y e. B ) } $=
set.mm:33636:29:error: (in csbeq1) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csbeq1 $p |- ( A = B -> »[« A / x ] C = [ B / x ] C ) $=
set.mm:33647:31:error: (in cbvcsbv) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| cbvcsbv $p |- ( A e. V -> »[« A / x ] B = [ A / y ] C ) $=
set.mm:33656:27:error: (in csbeq1d) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csbeq1d $p |- ( ph -> »[« A / x ] C = [ B / x ] C ) $=
set.mm:33664:17:error: (in csbid) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csbid $p |- »[« x / x ] A = A $=
set.mm:33681:30:error: (in csbcog) Grammatical ambiguity detected; from here can be parsed wceq or wsbc
| csbcog $p |- ( A e. V -> »[« A / y ] [ y / x ] B = [ A / x ] B ) $=
set.mm:33691:49:error: (in csbexg) Grammatical ambiguity detected; from here can be parsed wcel or wsbc
| csbexg $p |- ( ( A e. V /\ A. x B e. W ) -> »[« A / x ] B e. _V ) $=
set.mm:33704:17:error: (in csbex) Grammatical ambiguity detected; from here can be parsed wcel or wsbc

(note that csbeq1a is *not* flagged; its statement remains unambiguous even in
the modified grammar.)

However, it's not quite enough, because to actually catch the cases that can
lead to logical inconsistency (like my example above), you also need to make
sure that nothing changes intepretation in the *middle of a proof*.
Equivalently, it's possible to change the grammar to an inherently unambiguous
form (such as Polish notation or parse trees) and then perform verification in
that scheme. I intend to explore the use of parse tree based verification when
I next revisit smm (adding the ability to inspect and/or create proofs, rather
than just verifying them).

I suspect that for a seriously optimized verifier in a strongly typed
imperative language, parse trees + hash consing may be the fastest way to go
for verification, in addition to automatically preventing intermediate steps
from actualizing an ambiguity; things are less clear with Javascript because
native strings outperform most of what you can throw together at the user level
for reasonable parameter ranges.

That is to say, I beleive that using a tree-based verifier, together with
checking unique readibility for all $a, $p, and $e statements, is sufficient to
prevent any real inconsistency from being created by an ambiguous grammar.
This is a marked strategic difference from mmj2, which Mario has in the past
attempted to add an unambiguity-prover to. (Such a prover would necessarily be
incomplete, but problems that are undecidable in general quite often have easy
instances.)

This is certainly a weakness in the Metamath system. Declaring definitional
extensions to be outside of the system simplifies "the system", but only by
gerrymandering it; if you want to compare Metamath to an LCF kernel, you should
compare said LCF kernel to the combined system of the proof verifier, a unique
derivation checker, and a definitional soundness checker. I have some vague
ideas for an LCF-style verifier in smm which does all derivations and
justification theorems "outside" and then passes a collection of proof trees to
a kernel implementing something much closer to the "Metamath formal system".

-sorear

fl

unread,

Sep 4, 2015, 5:07:59 AM9/4/15

to Metamath

In that thread, Alexander Smirnov wrote, "A proof of the absence of syntax ambiguity clearly should be a part of any proof of the soundness of set.mm." I'm not sure I understand why...

Yes it is not clear at all. Until Mario Carneiro required the modification, set.mm

was ambiguous and it didn't seem a problem. But maybe the ambiguity was solved by a rule of precedence. I don't know.

Anyway Carneiro's problem was not a logic one but an editor building one. Unfortunately in my opinion it leads to the introduction of new rules: for instance the rule that requires that there is only one entry point. It clubbers

set.mm with rules that are not in the theorical definition of a logic.

It might have been solved by an extra rule in the editor instead of an extra rule in the logic.

fl

unread,

Sep 4, 2015, 5:13:12 AM9/4/15

to Metamath

It clubbers

It clutters.

Mario Carneiro

unread,

Sep 4, 2015, 5:56:16 AM9/4/15

to metamath

On Fri, Sep 4, 2015 at 5:07 AM, fl <freder...@laposte.net> wrote:

In that thread, Alexander Smirnov wrote, "A proof of the absence of syntax ambiguity clearly should be a part of any proof of the soundness of set.mm." I'm not sure I understand why...

Yes it is not clear at all. Until Mario Carneiro required the modification, set.mm
was ambiguous and it didn't seem a problem. But maybe the ambiguity was solved by a rule of precedence. I don't know.

Anyway Carneiro's problem was not a logic one but an editor building one. Unfortunately in my opinion it leads to the introduction of new rules: for instance the rule that requires that there is only one entry point. It clubbers
set.mm with rules that are not in the theorical definition of a logic.

For the record, the changes to weq and similar things were the result of discussions between Norm and Mel O'Cat, back when he was still maintaining mmj2, which is before my time. I had no part in the decision and have not cast my vote one way or another on the matter.

As Stefan has pointed out, it is possible for an ambiguous grammar to be sound - the obvious example being a non-bracketed "ph /\ ps" conjunction, where the ambiguity in parsing corresponds to distinct but logically equivalent parse trees. Nevertheless, having a subtle and benign ambiguity means more work for tooling, and remember that Metamath and set.mm were designed to be simple to parse and easy to write tools for, so it would seem reasonable to require an unambiguous grammar when it does not unduly impact other functionality.

It might have been solved by an extra rule in the editor instead of an extra rule in the logic.

Note that Metamath core has no requirements of ambiguous grammars. Grammar unambiguity is simply a desirable feature for a logic, and we strive to maintain it in set.mm using the sufficient (but not necessary) conditions laid out in grammar-ambiguity.txt or Norm's constant-only convention before that. As with many properties of a logic, it is not the sort of thing that can be proven from within the logic, so it is necessary to have informal or formal meta-proofs of this.