MMOM redux thread

62 views

Skip to first unread message

Stefan O'Rear

unread,

Jun 13, 2015, 2:31:35 AM6/13/15

to metamath list

I think I've figured out where Mario and I were talking past each other
yesterday. MMOM as currently implemented is something that can parse a totally
arbitrary character sequence into something resembling a metamath database with
a list of errors, then spit it out again. This was imagined for the use case
of a database editing program that could load corrupt databases and fix them
with the MMOM resources. But I see now that that isn't necessary; it would be
quite sufficient to declare that when a corrupt database is parsed, there is
either no MMOM output, or a degraded output that cannot be used for
editing/saving. The interactive repair tool could still work, it would just
have to use the parser output for annotations and then have you edit the file
at the character level.

The question remains of exactly what constraints are required to hold at all
times by MMOM. At this point I would definitely like MMOM editing databases to
always consist of a sequence of syntactically valid statements; and I would
also like it to be possible for invalid proofs to exist in an in-memory
database, because requiring all proofs to be valid at all times is probably
unnecessarily constraining for editing programs and unnecessarily
time-consuming to check.

As such, my current proposal follows:

1. A MMOM instance can represent any database which consists of a sequence of
syntactically valid statements. Unbalanced ${ $} will be allowed for now since
detecting that would be more trouble than it is worth in the current
implementation; do not rely on this always being the case.

2. #1 is maintained as an invariant. Mutation methods will refuse to do
anything that would break the statement-level syntactic validity of the
database.

3. If the database contains syntactically invalid statements when read, then it
is created in a degraded mode and cannot be edited. As such, the scanErrors
will only ever have to be set *once*.

4. A general mutation method will be provided that removes zero or more
statements and adds zero or more statements, which must be syntactically valid.
For best future compatibility, you should only monkey with ${ $} as pairs.
Beyond that I'm not keen to add restrictions - if you want a tool that can only
modify proof bodies, you have options already.

5. Non-general mutation methods will be added as needed with specific
constraints. For instance, I expect methods that change the label, math
string, or proof string of specific statements to be an early addition.

6. Analyzers which add derived data to databases will watch for mutation
events. Any event not recognized will be treated as "forget everything and
recompute from scratch when next asked". Specific mutation events will provide
significant speedups here on an as-needed basis (Scoper can completely ignore
proof string changes). However, the point is that this optimization can be
deferred until it is actually needed and does not block initial development.

7. A real MMOM API will be developed so that user scripts don't have to break
every time a tiny change is made to the representation details.

8. Streaming is not planned as a feature at this time. I'm not opposed to it,
I just don't see a compelling need.

9. "Sparse checkouts" will be explored at some point, likely based on tuning
the data maintained by various analyzers to be efficiently storable in a
key-value system and adding extension points to identify needed portions. This
is not a requirement for the first versions.

With that in mind, my tentative agenda:

A. Finish the MMError subsystem so that it includes much more information.

B. Come up with a first pass read-only MMOM stable API, document and implement
it.

C. Implement the minimal usable read-write API (statement splice + write source
+ the change notification system for analyzers).

D. Add the obvious missing analyzers (Outline, MathTypesetting, MathParser,
Tracebacks, I missing any you want?). Write some toys using this (eg: a script
to convert set.mm to Polish notation).

E. Open a discussion concerning expanded proof representations (which is a
different subject from the gross database representation discussed here).

Comments? [Should I take this off-list at this point?]

-sorear

Mario Carneiro

unread,

Jun 14, 2015, 1:35:52 AM6/14/15

to metamath

On Sat, Jun 13, 2015 at 2:31 AM, Stefan O'Rear <stef...@cox.net> wrote:

I think I've figured out where Mario and I were talking past each other
yesterday. MMOM as currently implemented is something that can parse a totally
arbitrary character sequence into something resembling a metamath database with
a list of errors, then spit it out again. This was imagined for the use case
of a database editing program that could load corrupt databases and fix them
with the MMOM resources. But I see now that that isn't necessary; it would be
quite sufficient to declare that when a corrupt database is parsed, there is
either no MMOM output, or a degraded output that cannot be used for
editing/saving. The interactive repair tool could still work, it would just
have to use the parser output for annotations and then have you edit the file
at the character level.

The question remains of exactly what constraints are required to hold at all
times by MMOM. At this point I would definitely like MMOM editing databases to
always consist of a sequence of syntactically valid statements; and I would
also like it to be possible for invalid proofs to exist in an in-memory
database, because requiring all proofs to be valid at all times is probably
unnecessarily constraining for editing programs and unnecessarily
time-consuming to check.

As such, my current proposal follows:

I have no problems with this proposal.

D. Add the obvious missing analyzers (Outline, MathTypesetting, MathParser,
Tracebacks, I missing any you want?). Write some toys using this (eg: a script
to convert set.mm to Polish notation).

This should include anything needed to make the HTML output in principle. In particular, I would want theorem numbers matching the HTML output and optionally the metamath.exe numbers as well.

Since you are writing in JS, it should not be difficult to use eval() to enable a "Turing-complete user script language". The only thing needed then is a decent API for accessing, adding, removing steps (and of course a step viewer of some sort, probably similar to the .mmp file format / HTML proof output).

E. Open a discussion concerning expanded proof representations (which is a
different subject from the gross database representation discussed here).

This has not been done in metamath.exe or mmj2, but you might consider an entirely tree-based representation for steps and formulas, with the conversion to strings only being done at print time. (By contrast, mmj2 uses token arrays for step contents and metamath uses arrays for everything, which is why your 'evil' example crashed them.) I don't know if you have worked on parsing formulas yet; in general it can be a hard problem but somehow mmj2 manages, while I think MM-PA sometimes gets confused here and has to ask the user to disambiguate. In addition to having better memory management asymptotics, it is also much easier to work with the tree representation for automatic proofs (although if possible a user script shouldn't need to specify syntax steps).

Mario

Stefan O'Rear

unread,

Jun 14, 2015, 1:58:13 AM6/14/15

to meta...@googlegroups.com

On Sun, Jun 14, 2015 at 01:35:51AM -0400, Mario Carneiro wrote:
> > D. Add the obvious missing analyzers (Outline, MathTypesetting, MathParser,
> > Tracebacks, I missing any you want?). Write some toys using this (eg: a
> > script
> > to convert set.mm to Polish notation).
> >
>
> This should include anything needed to make the HTML output in principle.
> In particular, I would want theorem numbers matching the HTML output and
> optionally the metamath.exe numbers as well.

This can be done. BTW, one I thought of after posting that was
DefinitionalSoundness; do you have a writeup of the tests that need to be done
for that anywhere?

> Since you are writing in JS, it should not be difficult to use eval() to
> enable a "Turing-complete user script language". The only thing needed then
> is a decent API for accessing, adding, removing steps (and of course a step
> viewer of some sort, probably similar to the .mmp file format / HTML proof
> output).

I'm not sure what the context of the request for a step viewer is. I'd
eventually like to rewrite MPE as client-side javascript using smm, but it
sounds like you have another application in mind?

> > E. Open a discussion concerning expanded proof representations (which is a
> > different subject from the gross database representation discussed here).
> >
>
> This has not been done in metamath.exe or mmj2, but you might consider an
> entirely tree-based representation for steps and formulas, with the
> conversion to strings only being done at print time. (By contrast, mmj2

Yes, this seems like a very good idea.

> uses token arrays for step contents and metamath uses arrays for
> everything, which is why your 'evil' example crashed them.) I don't know if
> you have worked on parsing formulas yet; in general it can be a hard

I haven't started the code yet but I've worked out out in my head (weeks ago) a
design for a statement parser that should be relatively simple and extremely
fast (set.mm's prefix unambiguity means that packrat parsers run in linear
time, and packrat parsers are "just code" so I can generate Javascript for them
at runtime to mitigate the cost of table lookups).

> problem but somehow mmj2 manages, while I think MM-PA sometimes gets
> confused here and has to ask the user to disambiguate. In addition to
> having better memory management asymptotics, it is also much easier to work
> with the tree representation for automatic proofs (although if possible a
> user script shouldn't need to specify syntax steps).

Yes, the "proof DOM" subsystem should hide the existance of syntax steps.

This, together with the requirement for a parser, limits us to tree-structured
languages, ie no miu.mm. I don't think this will be much of an issue in
practice.

What do you want to do about the configuration problem? The statement parser
needs to know that "wff", "class", and "set" are context-free classes, while
"|-" should always be parsed as a "wff". (I'm tentatively planning to add a $(
$smm {JSON-BLOB-HERE} $) comment to set.mm. Not sure what to do when I get to
"fonthtmldef", but that's a way off.)

-sorear

Mario Carneiro

unread,

Jun 14, 2015, 3:01:49 AM6/14/15

to metamath

On Sun, Jun 14, 2015 at 1:58 AM, Stefan O'Rear <stef...@cox.net> wrote:

On Sun, Jun 14, 2015 at 01:35:51AM -0400, Mario Carneiro wrote:
> > D. Add the obvious missing analyzers (Outline, MathTypesetting, MathParser,
> > Tracebacks, I missing any you want?). Write some toys using this (eg: a
> > script
> > to convert set.mm to Polish notation).
> >
>
> This should include anything needed to make the HTML output in principle.
> In particular, I would want theorem numbers matching the HTML output and
> optionally the metamath.exe numbers as well.

This can be done. BTW, one I thought of after posting that was
DefinitionalSoundness; do you have a writeup of the tests that need to be done
for that anywhere?

> Since you are writing in JS, it should not be difficult to use eval() to
> enable a "Turing-complete user script language". The only thing needed then
> is a decent API for accessing, adding, removing steps (and of course a step
> viewer of some sort, probably similar to the .mmp file format / HTML proof
> output).

I'm not sure what the context of the request for a step viewer is. I'd
eventually like to rewrite MPE as client-side javascript using smm, but it
sounds like you have another application in mind?

Well, if it is to replace mmj2 then it has to support at least many of the proof assistant capabilities of mmj2. In particular, although as an MM-PE fan you may not think much of it, mmj2 shows you a list of the steps you have proven so far, and that can sometimes be useful. Of course it also shows you the steps that you *haven't* completed yet, and these are downright necessary for any proof assistant. I'm not asking you to write "mmj2 for javascript", but you should also keep in mind the current proof interface for both mmj2 and MM-PA when drafting the smm GUI.

> uses token arrays for step contents and metamath uses arrays for
> everything, which is why your 'evil' example crashed them.) I don't know if
> you have worked on parsing formulas yet; in general it can be a hard

I haven't started the code yet but I've worked out out in my head (weeks ago) a
design for a statement parser that should be relatively simple and extremely
fast (set.mm's prefix unambiguity means that packrat parsers run in linear
time, and packrat parsers are "just code" so I can generate Javascript for them
at runtime to mitigate the cost of table lookups).

Wow, I've never heard of PEGs or packrat parsers before. This is indeed a powerful technique, and it seems like the only constraint on the grammar is that it be unambiguous. (Also that's the first time I've heard a useful purpose for self-modifying code.)

> problem but somehow mmj2 manages, while I think MM-PA sometimes gets
> confused here and has to ask the user to disambiguate. In addition to
> having better memory management asymptotics, it is also much easier to work
> with the tree representation for automatic proofs (although if possible a
> user script shouldn't need to specify syntax steps).

Yes, the "proof DOM" subsystem should hide the existance of syntax steps.

This, together with the requirement for a parser, limits us to tree-structured
languages, ie no miu.mm. I don't think this will be much of an issue in
practice.

Huh, totally forgot that this was legal metamath. Of course miu.mm is also tree-structured; the problem is that it can change its mind about what the tree looks like after the fact, because miu.mm is *not* an unambiguous grammar, due to:

wx $f wff x $.
wy $f wff y $.
wxy $a wff x y $.

Just as mmj2 does not check (because it is hard to impossible in general) but requires (by contract) that the grammar of the database in question is unambiguous, so too should smm. Perhaps you should make an exception for the verifier, though, so that miu.mm can be verified as legal metamath even if editing is out.

What do you want to do about the configuration problem? The statement parser
needs to know that "wff", "class", and "set" are context-free classes, while
"|-" should always be parsed as a "wff". (I'm tentatively planning to add a $(
$smm {JSON-BLOB-HERE} $) comment to set.mm. Not sure what to do when I get to
"fonthtmldef", but that's a way off.)

A while ago I asked Norm for permission to add a $j statement to the comment, but I never moved on the idea. I suggest that you take advantage of a mechanism like this, and smm and mmj2 can share the format. The syntax would look something like the $t comment; a $( $j line followed by a series of commands in some format. The RunParms format often requires long lines, and so is unsuitable - I would suggest a "keyword param param;" style like the $t commands. I'm forwarding our discussion below.

Mario

---------- Forwarded message ----------
From: Norman Megill <n...@alum.mit.edu>
Date: Fri, May 23, 2014 at 10:05 PM
Subject: Re: mmj2 tags in comments
To: Mario Carneiro <di....@gmail.com>

Hi Mario,

What about this idea: any comment starting with $j (or maybe $m?) would
be reserved for mmj2 use, analogous to the way that a comment with a $t
is reserved for use by typesetting definitions. What goes inside the
comment would be whatever syntax you want, although lines should be < 80
chars long. Ideally it would tolerate variable whitespace and
line wrapping, but that's not absolutely required.

If theorem labels could be surrounded by whitespace rather than
separated by commas, that would make it easier to do global label
substitutions with simple (non-regex) editor and scripting commands, but
I'll leave that up to you. That is why I require whitespace between ~
and label, as well as after the label, in the comment markup.

> perhaps as a second comment just before the HTML
> comment, like your TODO comments.

For maintenance, and better immunity to accidental edits by people who
don't know the syntax, it might be better to have one or a few big $j
comments, similar to the big $t comment. I find the htmldef's easier to
maintain in a single place than having say a $t comment next to each $c
and $v. For example, a single $j statement could have a list of
theorems with certain characteristics, rather than a separate $j before
each theorem saying "the next theorem has this characteristic".

That would be my preference, but if you think localized $j's before
individual theorems (appearing in the comment before the HTML comment)
are necessary for some reason, I'll go along with it. Localized $j's
might also be somewhat cluttering and distracting to someone reading
through the set.mm source, especially if they don't understand what they
are looking at.

Presumably the mmj2 program would check the syntax of any $j's as
well as the existence of all referenced theorems, when the source
is read in.

(BTW I'll probably break up the big $t into a $t for each mathbox
someday, for better mathbox modularity.)

Norm

On Fri, May 23, 2014 at 8:29 PM, Mario Carneiro <di....@gmail.com> wrote:

Hi Norm,

Are there any "tags" that can be put in comments but would not show up in the HTML that mmj2 could parse for its own use? There are two features in mmj2 that could benefit from this. One, which already exists, is the ProofAsstUnifySearchExclude RunParm, which currently involves a list of labels that will be excluded from any unification searches. My list includes ax* type theorems, *OLDs, biigb and others that say "should not be referenced in any proof", but it would make a lot more sense to have this information stored directly in the file, perhaps as a second comment just before the HTML comment, like your TODO comments. The reason this concerns you is of course that it would involve modifications to the published version of set.mm, so I'm looking for what kinds of extended syntax would be acceptable to you.

The other feature, which does not yet exist, but will be very useful if it is completed, is what I am calling for the moment "preferred theorems". These are theorems that will always unify immediately with steps if they can match the consequent. The theorems that this would be useful for things like mulcli, fveq2d, and the like, where the theorem is essentially the only one which is ever applied when it is applicable. This is not necessary for zero-hyp theorems, which are already used whenever applicable, but for theorems with one or more hyps there are usually many theorems that could apply, and even if we exclude theorems with no variables in the hypotheses that are not in the conclusion, there are lots of theorems like idi, notnotri, trud which apply to anything at all, or theorems like a1i that simplify the step, but probably not in the intended way most of the time.

But again, it makes a lot more sense to annotate these kinds of theorems directly in the file, rather than attempting to keep a separate list in my RunParms file (and this list doesn't get exported to other users, either). If I get the "preferred theorems" thing working correctly, this would allow us to unify an arbitrarily complicated substitution theorem all at once (although only for one variable at a time, so the justification theorems needed for the definition checker are still out of reach), or a complicated closure theorem (although only closure inferences, not deductions, unless we add mulcld-type theorems, which don't yet exist except for recnd - I've been investigating which theorems of this type would be the most useful, but I don't have any specific plans yet).

Mario

Stefan O'Rear

unread,

Jun 14, 2015, 3:27:36 AM6/14/15

to meta...@googlegroups.com

On Sun, Jun 14, 2015 at 03:01:48AM -0400, Mario Carneiro wrote:
> On Sun, Jun 14, 2015 at 1:58 AM, Stefan O'Rear <stef...@cox.net> wrote:
> > This can be done. BTW, one I thought of after posting that was
> > DefinitionalSoundness; do you have a writeup of the tests that need to be
> > done
> > for that anywhere?

Question stands.

> Well, if it is to replace mmj2 then it has to support at least many of the
> proof assistant capabilities of mmj2. In particular, although as an MM-PE
> fan you may not think much of it, mmj2 shows you a list of the steps you
> have proven so far, and that can sometimes be useful. Of course it also
> shows you the steps that you *haven't* completed yet, and these are
> downright necessary for any proof assistant. I'm not asking you to write
> "mmj2 for javascript", but you should also keep in mind the current proof
> interface for both mmj2 and MM-PA when drafting the smm GUI.

Ah. Having a proof assistant is still a bit down my TODO so I didn't catch
this at first.

> A while ago I asked Norm for permission to add a $j statement to the
> comment, but I never moved on the idea. I suggest that you take advantage
> of a mechanism like this, and smm and mmj2 can share the format. The syntax
> would look something like the $t comment; a $( $j line followed by a series
> of commands in some format. The RunParms format often requires long lines,
> and so is unsuitable - I would suggest a "keyword param param;" style like
> the $t commands. I'm forwarding our discussion below.

If this is to be the format used, then why have both comments? Just say that
statements within $t with unknown headwords are ignored.

(BTW, the metamath.pdf is not clear on precisely where white space is allowed
in the $t syntax.)

-sorear

Mario Carneiro

unread,

Jun 14, 2015, 3:52:33 AM6/14/15

to metamath

On Sun, Jun 14, 2015 at 3:27 AM, Stefan O'Rear <stef...@cox.net> wrote:

On Sun, Jun 14, 2015 at 03:01:48AM -0400, Mario Carneiro wrote:
> On Sun, Jun 14, 2015 at 1:58 AM, Stefan O'Rear <stef...@cox.net> wrote:
> > This can be done. BTW, one I thought of after posting that was
> > DefinitionalSoundness; do you have a writeup of the tests that need to be
> > done
> > for that anywhere?

Question stands.

I found this in my archives:

On Tue, May 20, 2014 at 11:53 AM, Mario Carneiro <di....@gmail.com> wrote:

At some point, it would be nice to have a writeup of the algorithm(s)
you are using and what assumptions you are making, so that (1) a
mathematician can be convinced that the soundness checking is correct
and (2) an independent soundness checker could be written by someone
else.

Actually, I'm not very happy with one particular part of the soundness checker, namely the free variable prover. The rest of it is fairly simple, but in order to show ( ph -> A. x ph ) for all the different syntax constructs requires a lot of reliance on mmj2's ability to derive proofs, although it should still err on the side of false positives, since if it is unable to do a certain proof it will pop the "possibly free" error. With Alexey's help, mmj2 has taken some great strides in being able to do complete proofs on its own, but it's not yet at the point where I can just hand it the justification theorem and ask it to prove it.

Here's an overview of the algorithm; the (error x) markings match the numbering code I-PA-020x:

1. Suppose the definition after grammar parsing has structure "ax: [[...];syn, [...];defn];eq" where "[a,b,c];d" represents a syntax axiom d with three hypotheses a,b,c. Then eq is required to be wb or wceq (error 1).

2. Find the declaration of 'syn' in the file, and ensure that no axioms strictly between 'syn' and 'ax' have 'syn' in their parse trees, and 'defn' also does not have 'syn' in its parse tree (error 2).

3. Collect all the varhyps in the left side tree into 'parameters' and all the varhyps in the right side of the tree but not in 'parameters' into 'dummies'. For each pair of parameters, there is required not to be a DJ condition on those variables (error 3).

4. For each variable in 'dummies', there is required to be a DJ condition with each dummy and each parameter (error 4).
5. If there are no dummies, stop processing and pass the test.

6. Look for a justification theorem. Make an assignment map from each dummy to a variable of the same type which is not present in parameters or dummies or that was previously generated in this loop, and apply it to the tree defn to produce defn2. Now attempt to prove the assertion "[[...];defn, [...];defn2];eq". If successful, pass the test.

This is enough for a "technically correct" definition checker, but because mmj2 is not powerful enough to always prove the justification theorem, we try an alternative approach if we are not done yet.

7. If we still have not passed the test and there are non-set variables in 'dummies', fail the test (error 5). This is necessary because the next step has to assume that the variables are set variables.

8. For each dummy x, fail the test (error 6) if provablyBoundIn(defn, x) returns false, where provablyBoundIn is defined as:

provablyBoundIn(tree: [children...];root, x):
1. If children is empty, return (root = x);

2. If for all children, provablyBoundIn(child, x) is true, return true;
3. Let tree2 = tree if tree is a wff expression and tree2 = [[y];cv, tree];wcel otherwise (when tree is a class expression), where y is a set variable not present in tree. Construct the tree [tree2, [x,tree2];wal];wi and attempt to prove it. Return true iff proof succeeds.

Your checker should give people a tremendous boost in the confidence in
Metamath as a rigorous tool for set theory, especially if there is an
unambiguous spec behind it. As I mentioned before, this has been a
"hole" that has made mathematicians uncomfortable with Metamath.

The latter part of the algorithm makes me a bit worried, as I mentioned, but my goal with this project was something that could work on an unmodified version of set.mm and still be able to pass the majority of cases. I feel a bit constrained between this goal and the other goal of making an algorithm that is simple enough to be easily verified as correct by a mathematician, and indeed I only feel prepared enough to call the latter part a heuristic rather than a proof. If we required justification theorems, the latter part would not be needed, but it still seems like something that could be autogenerated by the verifier rather than worked into the Metamath spec itself. But it has caught a lot of bugs, so that is enough to satisfy me.

Ah. Having a proof assistant is still a bit down my TODO so I didn't catch
this at first.

Of course, if user scripts are higher on that TODO, you can actually view a proof assistant as an advanced kind of user script.

> A while ago I asked Norm for permission to add a $j statement to the
> comment, but I never moved on the idea. I suggest that you take advantage
> of a mechanism like this, and smm and mmj2 can share the format. The syntax
> would look something like the $t comment; a $( $j line followed by a series
> of commands in some format. The RunParms format often requires long lines,
> and so is unsuitable - I would suggest a "keyword param param;" style like
> the $t commands. I'm forwarding our discussion below.

If this is to be the format used, then why have both comments? Just say that
statements within $t with unknown headwords are ignored.

Hm, I guess the only argument against this is that the comment claims to be "typesetting definitions". (Norm: how well does Metamath handle unknown keywords in this section?)

(BTW, the metamath.pdf is not clear on precisely where white space is allowed
in the $t syntax.)

Again, this is probably a question for Norm, but to me the syntax looks very free, a C-like style where whitespace is freely adjustable anywhere except inside strings. Additionally the semicolon delimiter does not require a space before it unlike most metamath tokens, and although the convention seems to be spaces around "+" I'll bet that's optional as well. I would go with C-style tokens: [a-zA-Z_][0-9a-zA-Z_]*, and anything else (punctuation) is a token separator (assume no << double punctuation operators); strings can be quoted with either 'x' or "x" and there is no escaping (join strings with + if you have to). In particular, a legal label name will not necessarily be parsed as one token and may need to be quoted. (Maybe we can relax the token regex to [0-9a-zA-Z._-]+ since hyphens and dots are also common in labels.)

Mario

Norman Megill

unread,

Jun 15, 2015, 9:10:59 AM6/15/15

to meta...@googlegroups.com, di....@gmail.com

On 6/14/15 3:52 AM, Mario Carneiro wrote:
>
> On Sun, Jun 14, 2015 at 3:27 AM, Stefan O'Rear <stef...@cox.net

> <mailto:stef...@cox.net>> wrote:
...

>
>     > A while ago I asked Norm for permission to add a $j statement to the
>     > comment, but I never moved on the idea. I suggest that you take advantage
>     > of a mechanism like this, and smm and mmj2 can share the format. The syntax
>     > would look something like the $t comment; a $( $j line followed by a series
>     > of commands in some format. The RunParms format often requires long lines,
>     > and so is unsuitable - I would suggest a "keyword param param;" style like
>     > the $t commands. I'm forwarding our discussion below.
>
>     If this is to be the format used, then why have both comments? Just
>     say that
>     statements within $t with unknown headwords are ignored.
>
>
> Hm, I guess the only argument against this is that the comment claims to
> be "typesetting definitions". (Norm: how well does Metamath handle
> unknown keywords in this section?)

Currently it is unforgiving:

    ?Error on line 253264 of file "set.mm":
    abc
    ^^^
    Expected "latexdef", "htmldef", "htmlvarcolor", "htmltitle", "htmlhome",
    "althtmldef", "exthtmltitle", "exthtmlhome", "exthtmllabel", "htmldir",
    "althtmldir", "htmlbibliography", or "exthtmlbibliography" here.

    ?There was an error in the $t comment's Latex/HTML definitions.
    ?HTML generation was aborted due to the error above.

But this was just my choice at the time; there's no reason it couldn't
skip the unknown keyword through the next semicolon, and I'll add
a to-do to fix it so it at least continues rather then abort. However,
a clutter of such warnings for unknown keywords would make it
harder to find real problems. OTOH if it silently ignores unknown
keywords, it would miss a misspelling of 'htmldef'.

So I think it would be best to agree on a set of allowed keywords that
could be added to over time, and metamath.exe would ignore the ones it
doesn't use. Alternately, some other comment token such as $j
would be ignored by all current programs.

I believe that mmj2 does, or did, parse the $t comment, but I don't
know how forgiving it is.

>
>     (BTW, the metamath.pdf is not clear on precisely where white space
>     is allowed
>     in the $t syntax.)
>
>
> Again, this is probably a question for Norm, but to me the syntax looks
> very free, a C-like style where whitespace is freely adjustable anywhere
> except inside strings. Additionally the semicolon delimiter does not
> require a space before it unlike most metamath tokens, and although the
> convention seems to be spaces around "+" I'll bet that's optional as
> well. I would go with C-style tokens: [a-zA-Z_][0-9a-zA-Z_]*, and
> anything else (punctuation) is a token separator (assume no << double
> punctuation operators); strings can be quoted with either 'x' or "x" and
> there is no escaping (join strings with + if you have to). In
> particular, a legal label name will not necessarily be parsed as one
> token and may need to be quoted. (Maybe we can relax the token regex to
> [0-9a-zA-Z._-]+ since hyphens and dots are also common in labels.)

metamath.pdf p. 155: "The fields are separated by white space (blanks,
carriage returns, etc.), although white space is not needed before the ;
terminator. Each definition should start on a new line."
p. 156: "The string enclosed in quotation marks may not include line breaks."

So theoretically space is required around "+", although metamath.exe is
not enforcing this right now (should probably be considered a bug). mmj2
may complain, not sure, but ocat was very focused on the precise
wording of the spec.

Historically, my tendency used to be to have the program tolerate such
minor things for user convenience. It became apparent that was a
mistake when other people started to write independent verifiers that
enforced the spec strictly. So I tightened things up in the main
language parser but didn't pay as much attention to the $t parsing.

Also, that "each definition should start on a new line" was due to a
quick-and-dirty original algorithm that processed the $t line by line.
It is no longer required and will be removed from the spec.

In retrospect, not requiring space before ';' seems oddly inconsistent
with everything else. I think it was just to make the spec conform to
existing practice. Perhaps I should change it to require a space?

Norm

Mario Carneiro

unread,

Jun 15, 2015, 9:28:27 AM6/15/15

to metamath

I agree. Since at this point we are coordinating the addition of keywords which are no longer "mmj2-specific" but have a certain objective semantics, it would be best to give them a name reflecting this semantics and use them (or ignore them) individually in each tool.

In retrospect, not requiring space before ';' seems oddly inconsistent
with everything else. I think it was just to make the spec conform to
existing practice. Perhaps I should change it to require a space?

I'm fine either way. Note that "quoting", and joining of strings with "+", is also unique to this section, but there is no other place in a metamath file where it is necessary to specify exact character strings (theorem comments do this as well, but whitespace is not honored in comments) so this is understandable. If you want to revisit this decision, I would suggest replacing the terminator with "$." for consistency with other metamath statements; of course then it would seem unbalanced without some (otherwise redundant) opener, e.g. something like "$k keyword token token $.".

Mario

Stefan O'Rear

unread,

Jun 15, 2015, 11:25:44 PM6/15/15

to meta...@googlegroups.com

On Mon, Jun 15, 2015 at 09:28:26AM -0400, Mario Carneiro wrote:
> On Mon, Jun 15, 2015 at 9:10 AM, Norman Megill <n...@alum.mit.edu> wrote:
> > So I think it would be best to agree on a set of allowed keywords that
> > could be added to over time, and metamath.exe would ignore the ones it
> > doesn't use. Alternately, some other comment token such as $j
> > would be ignored by all current programs.
> >
>
> I agree. Since at this point we are coordinating the addition of keywords
> which are no longer "mmj2-specific" but have a certain objective semantics,
> it would be best to give them a name reflecting this semantics and use them
> (or ignore them) individually in each tool.

+1

(I'm not going to start describing a list here and now; I'll implement whatever
I need in smm first so that I'll know what I need.)

> > In retrospect, not requiring space before ';' seems oddly inconsistent
> > with everything else. I think it was just to make the spec conform to
> > existing practice. Perhaps I should change it to require a space?
> >

Just rechecked the metamath.pdf section and my notes. The following questions
were not answered there:

1. Whitespace required around + ?
2. Whitespace required after $t ?
3. Whitespace required before/after /* / */ ?
4. Does /**/ count as whitespace? Do we follow K&R rules, where la/**/texdef
would be legal, ANSI rules where /**/ is whitespace (thus permitting
latexdef/**/"foo"), or do we require whitespace around the comment to make this
moot?
5. Whitespace required after ; ?
6. Are comments allowed before the $t ?
7. Are multiple $t comments allowed? (I know the answer from this thread. I
plan to allow multiples in smm.)

> I'm fine either way. Note that "quoting", and joining of strings with "+",
> is also unique to this section, but there is no other place in a metamath
> file where it is necessary to specify exact character strings (theorem
> comments do this as well, but whitespace is not honored in comments) so
> this is understandable. If you want to revisit this decision, I would
> suggest replacing the terminator with "$." for consistency with other
> metamath statements; of course then it would seem unbalanced without some
> (otherwise redundant) opener, e.g. something like "$k keyword token token
> $.".

I can't quite put a finger on why but I find this proposal very ugly and
objectionable.

-sorear

Norman Megill

unread,

Jun 16, 2015, 3:42:56 AM6/16/15

to meta...@googlegroups.com, stef...@cox.net

On 6/15/15 11:25 PM, Stefan O'Rear wrote:
> On Mon, Jun 15, 2015 at 09:28:26AM -0400, Mario Carneiro wrote:
>> On Mon, Jun 15, 2015 at 9:10 AM, Norman Megill <n...@alum.mit.edu> wrote:
>>> So I think it would be best to agree on a set of allowed keywords that
>>> could be added to over time, and metamath.exe would ignore the ones it
>>> doesn't use. Alternately, some other comment token such as $j
>>> would be ignored by all current programs.
>>>
>>
>> I agree. Since at this point we are coordinating the addition of keywords
>> which are no longer "mmj2-specific" but have a certain objective semantics,
>> it would be best to give them a name reflecting this semantics and use them
>> (or ignore them) individually in each tool.
>
> +1
>
> (I'm not going to start describing a list here and now; I'll implement whatever
> I need in smm first so that I'll know what I need.)
>
>>> In retrospect, not requiring space before ';' seems oddly inconsistent
>>> with everything else. I think it was just to make the spec conform to
>>> existing practice. Perhaps I should change it to require a space?
>>>
>

Overall, there seems to be a tension between the $t comment wanting to
be a part of an extension of the Metamath language on the one hand
(where spacing between tokens is very strict) and wanting to be its own,
C-like language on the other hand, where spacing is required only when
needed to avoid ambiguity. I'll mention what I may have had in mind
in metamath.pdf and also the current metamath.exe behavior. Whatever
way is decided by consensus of people here is fine with me, and I'll
update metamath.pdf and/or metamath.exe accordingly.

> Just rechecked the metamath.pdf section and my notes. The following questions
> were not answered there:
>
> 1. Whitespace required around + ?

I would say yes, since implicitly it is a kind of "field". (metamath.exe
doesn't currently require it.)

> 2. Whitespace required after $t ?

I would say yes, since it is the opening "field". (metamath.exe
doesn't currently require it.)

> 3. Whitespace required before/after /* / */ ?
> 4. Does /**/ count as whitespace? Do we follow K&R rules, where la/**/texdef
> would be legal, ANSI rules where /**/ is whitespace (thus permitting
> latexdef/**/"foo"), or do we require whitespace around the comment to make this
> moot?

Here it gets murky. C-style comments were added later and are not
mentioned in metamath.pdf. My first reaction was to say that we should
require whitespace around /* and */ to be consistent with the rest of
the philosophy, but I see /*abc*/ is currently used in set.mm (which is
easy enough to change if we want, but without spaces it visually
indicates e.g. a commented-out Unicode entity vs. a general informational
comment).

Also, I hope it is obvious that inside of strings as in " /* " the /* would be
a literal part of the string and not interpreted as the beginning of a comment.

The current algorithm in metamath.exe effectively replaces a comment
with whitespace, so latexdef/**/"foo" would be two tokens. I might add
that comments can't be nested; everything inside a comment is ignored
until "*/".

I leave this open to discussion.

> 5. Whitespace required after ; ?

I would say yes, since it is a "field", and this would be implicit in
the statement "white space is not needed before the ; terminator".
(metamath.exe doesn't currently require it.)

> 6. Are comments allowed before the $t ?

I would say no, only because it makes it easier to see the start of the
$t comment that way. (metamath.exe actually allows anything before the
$t, so it doesn't care. The current algorithm is simply to look for the
first "$t" string anywhere in the set.mm file then start processing
until $) is encountered.) I'm open to discussion on this.

> 7. Are multiple $t comments allowed? (I know the answer from this thread. I
> plan to allow multiples in smm.)

Currently metamath.exe doesn't allow them, but this is an implementation
deficiency I plan to remove so that mathboxes can have their own $t.
I believe mmj2 allows multiples because I told ocat that was my plan.

Historically, the $t content was a separate file that tended to get
overlooked or out of sync when not needed, and having it inside set.mm
solves this. The current algorithm used to read this file, and it was
modified to use the $t...$) content instead. That's the main reason for
the current single $t restriction.

I have mixed feelings about actually _using_ multiple $t's, though.
When doing set.mm maintenance on them it is very convenient to have them
all in one place. If people start to include a new $t comment with
every $c, it would be a maintenance pain, even though I can see some
benefit of having all info about the $c in one place. For my own
convenience, I'd probably adopt the convention to have at most one per
mathbox initially.

>
>> I'm fine either way. Note that "quoting", and joining of strings with "+",
>> is also unique to this section, but there is no other place in a metamath
>> file where it is necessary to specify exact character strings (theorem
>> comments do this as well, but whitespace is not honored in comments) so
>> this is understandable. If you want to revisit this decision, I would
>> suggest replacing the terminator with "$." for consistency with other
>> metamath statements; of course then it would seem unbalanced without some
>> (otherwise redundant) opener, e.g. something like "$k keyword token token
>> $.".
>
> I can't quite put a finger on why but I find this proposal very ugly and
> objectionable.

I think having $. (or $<anything>) as a $t entry terminator makes it
start to look like Metamath language and is visually confusing. With
the sole exception of $t, so far $x tokens are exclusively reserved for
the Metamath language, and typesetting info is not really part of the
language.

In the very early days, I had $m math $m and $l label $l as comment
markup in place of the current ` math ` and ~ label . I found it
visually distracting.

Norm

Norman Megill

unread,

Jun 16, 2015, 12:21:48 PM6/16/15

to meta...@googlegroups.com, n...@alum.mit.edu, stef...@cox.net

On 6/16/15 3:42 AM, Norman Megill wrote:
> On 6/15/15 11:25 PM, Stefan O'Rear wrote:
> > 3. Whitespace required before/after /* / */ ?
> > 4. Does /**/ count as whitespace? Do we follow K&R rules, where
> la/**/texdef
> > would be legal, ANSI rules where /**/ is whitespace (thus permitting
> > latexdef/**/"foo"), or do we require whitespace around the comment to
> make this
> > moot?
>
> Here it gets murky. C-style comments were added later and are not
> mentioned in metamath.pdf. My first reaction was to say that we should
> require whitespace around /* and */ to be consistent with the rest of
> the philosophy, but I see /*abc*/ is currently used in set.mm (which is
> easy enough to change if we want, but without spaces it visually
> indicates e.g. a commented-out Unicode entity vs. a general informational
> comment).
>
> Also, I hope it is obvious that inside of strings as in " /* " the /*
> would be
> a literal part of the string and not interpreted as the beginning of a
> comment.
>
> The current algorithm in metamath.exe effectively replaces a comment
> with whitespace, so latexdef/**/"foo" would be two tokens. I might add
> that comments can't be nested; everything inside a comment is ignored
> until "*/".
>

Thinking over this some more, I might be in favor of just requiring
whitespace around everything, because it would make the spec and parsing
so much simpler without a bunch of special cases. This would require
whitespace around /*, */, and ;. We would have to fix set.mm but that is
easy.

In the original Metamath language itself, whitespace was optional
everywhere there was no ambiguity. This even included between math
symbols, and for ambiguity resolution it used the C rule of using the
longest substring without whitespace that was a legal token. This ended
up rather hard for a human to read given all the math symbols in set.mm,
and errors could be introduced when a new math symbol was added e.g. we
could use Ord for a binary relation O r d, but as soon as we introduced
an earlier new symbol Ord ambiguity would result. Or if we added Ord
later, imagine the confusion of having Ord stand for both O r d and
Ord in different places.

So I made whitespace a requirement between all math symbols, and later
extended it to all tokens. (As a relic of the old days, whitespace is
still optional in the 'let var' assignment when it's not ambiguous. It
saves typing sometimes.) Most computer languages acquire new features
and additional complexity over time, but Metamath has gone in the
opposite direction of having features removed over time. :)

Some other minor things:

1. I don't think set.mm ever uses the "double-quote escapes a quote"
feature as in "a""b" to quote a"b. Instead, we always break up the
string with + when it has both kinds of quotes. (TBH I'm not even sure quote
escaping has been thoroughly tested.) Personally I treat quote escaping
as deprecated since it theoretically isn't needed given the +
concatenator, but to remove it from the spec we would have to extend
the spec to allow + to appear in the token-string as well as the
latex-string to cover a possible future case of a math symbol containing
both kinds of quotes.

2. A little quirk that hopefully is obvious is that the string $) may
not appear inside of an htmldef quote, since it would mark the end of
the comment. The main language parser knows nothing about the "inside
of a string" context of the $t statement, nor should it IMO. The way to solve this of
course is to use "$" + ")". But even if people unknowingly use $) in a
quote, the error message would make it immediately apparent.

Norm

Stefan O'Rear

unread,

Jun 20, 2015, 5:42:47 PM6/20/15

to meta...@googlegroups.com

On Tue, Jun 16, 2015 at 12:42:56AM -0700, Norman Megill wrote:
> Overall, there seems to be a tension between the $t comment wanting to
> be a part of an extension of the Metamath language on the one hand
> (where spacing between tokens is very strict) and wanting to be its own,
> C-like language on the other hand, where spacing is required only when
> needed to avoid ambiguity. I'll mention what I may have had in mind
> in metamath.pdf and also the current metamath.exe behavior. Whatever
> way is decided by consensus of people here is fine with me, and I'll
> update metamath.pdf and/or metamath.exe accordingly.

FYI, I've just finished implementing $t parsing in smm. The high points:

Whitespace required everywhere except before ; and *inside* subcomments. In
other words, «htmldir /*Foo*/ '.';» is legal, but «htmldir/* Foo */'.';» is
not.

$t must be at the beginning, no comment permitted before.

Multiple $t comments are allowed and contribute to a common output.

$t comments which are contained within other statements (ie:
«$c foo $( $t ... ; $) $.») are ignored. I'm pretty sure this is a bug but it
would significantly complicate things to fix it as currently designed.

Unknown directives skip until the next ; with a warning.

[The fourth point leads to a strange question about sections. If I have:

foo $a |- A B
$(
=-=-
A section
$)
C D $.

which section should foo get placed in? smm will probably simply ignore the
section break in that case, if I can get away with it doing so.]

-sorear

Norman Megill

unread,

Jun 20, 2015, 7:55:54 PM6/20/15

to meta...@googlegroups.com, stef...@cox.net

On Saturday, June 20, 2015 at 5:42:47 PM UTC-4, Stefan O'Rear wrote:

> $t comments which are contained within other statements (ie:
> «$c foo $( $t ... ; $) $.») are ignored. I'm pretty sure this is a bug but it
> would significantly complicate things to fix it as currently designed.

I looked at the metamath.exe code and it turns out it only scans
for $t in comments outside of statements, so it will also ignore
such a $t.

> [The fourth point leads to a strange question about sections. If I have:
>
> foo $a |- A B
> $(
> =-=-
> A section
> $)
> C D $.
>
> which section should foo get placed in? smm will probably simply ignore the
> section break in that case, if I can get away with it doing so.]

It will be ignored. Only comments outside of statements are considered
in the metamath.exe program.

There is another aspect of the algorithm you need to be aware of:

There are 3 kinds of section headers, big (####...), medium (#*#*#*...),
and small (=-=-=-).

Call the collection of (outside-of-statement) comments between two
successive $a/$p statements (i.e. those statements that generate web
pages) a "header area". The algorithm scans the header area for the
_last_ header of each type (big, medium, small) and discards all others.
Then, if there is a medium and it appears before a big, the medium is
discarded. If there is a small and it appears before a big or medium,
the small is discarded. In other words, a maximum of one header of
each type is kept, in the order big, medium, and small.

There are two reasons for doing this: (1) it disregards headers used
for other purposes such as the headers for the set.mm-specific
information at the top that is not of general interest and (2) it
ignores headers for empty sections; for example, a mathbox user might
have a bunch of headers for sections planned for the future, but we
ignore them if those sections are empty (no $a or $p in them).

Norm

Reply all

Reply to author

Forward

0 new messages