MM0 -> MM translation challenges

151 views
Skip to first unread message

Mario Carneiro

unread,
Jun 24, 2020, 12:57:45 AM6/24/20
to metamath
Hi All,

I'm thinking about finally writing the missing MM0 -> MM translator. The MM0 language was always designed to be compatible with metamath, so there is a mostly one to one mapping from MM0 theorems and proofs to MM. But MM0 has some features that do not easily lower, and some intelligence is needed to select the right choice.

Part 1: General translation

First, I will look at translating an arbitrary MM0 database to an MM database, with no particular relation to set.mm.

1. Variables. In MM0, every theorem declares its own variables and the types of those variables. There is no notion of a global variable. This is sometimes used (especially in the more CS applications) with descriptively named variables like "addr" or "env". How should these be translated? One option is to use MM local variables, i.e. variables in local scope. But this still requires adding a lot of (global) $v declarations for every variable name in every scope, and moreover I think this runs into restrictions with a variable having two types (for example, "a" is used as a wff in the logic part of the peano database and then as a nat variable later on).

  * Option 1: Disambiguate all variable names with their sort, i.e. "aW" and "aN". This is simple to implement but not super nice to look at.

  * Option 2: Same as 1 but only disambiguate when a conflict is observed.  This is better but still not great because common variables like "a" will get disambiguated at all their uses.

  * Option 3: Ignore variable names from the source entirely and select from a predefined list of variables of each type, i.e. specify that wff variables should be "ph,ps,ch,th,..." and nat variables should be "a,b,c,..." This requires more configuration work, and it also ignores the names in the source, which may be informative, for example variables named "G" are often used for the context in deduction style theorems, and names like "env" will clearly be clobbered. If we run out of given names we will have to allocate names like "x1, x2, x3" which are even less pleasant.

2. Definitions. In MM0 you can make a definition and there is a primitive step for unfolding them. This is not terribly hard to model in MM using equality proofs, although it requires that every sort have an equality relation on it. In MM0 not all sorts have an equality relation defined on them, and even when there is an explicit equality relation the system does not register this fact explicitly. For example, in peano.mm1 there is an "eq" relation on nat, and enough axioms to prove that every term respects equality (similar to equality in set.mm), but like metamath there is nothing special about this term constructor and without one of the axioms this theorem would not be true. By contrast the definitional equality relation exists on all sorts, has its own rules separate from theorem application, and every MM0 term respects definitional equality regardless of any axioms that exist.

  * Option 1: Represent the definitional equality relation as a new term constructor, one for each sort. Every term constructor gets an axiom saying it respects definitional equality, and then we have to prove automatically for every new definition that it too respects definitional equality. A downside of this is that for types with an equality already we get two equality relations, and what's worse you can't even prove "defeq a b -> a = b" although it is true at the metalevel.

  * Option 2: Add configuration so that you can say that a specified relation should be used in preference to adding a new defeq relation, when one exists on the sort. (For a database like set.mm this is not a problem since every sort has an equality relation except setvar, which is a pure sort and so does not need a defeq relation.) This generally results in more idiomatic MM statements, although it does require identifying all the axioms that the defeq judgment needs, and the system will probably end up reproving a lot of equality theorems that already exist, unless these are also marked.

3. Sort modifiers. MM0 has a number of sort modifiers, which were originally devised to make the translation to MM clearer, so hopefully this should be easy. By default a MM0 sort corresponds to two MM sorts, one for variables and one for terms of that sort. A pure sort is one without any term constructors, so in this case we can skip the term version; and a strict sort is one without any binders, and for this we can skip the variable version. A free sort is one where you can't drop unbound variables; this one is problematic for MM but we can just ignore it in translation, provided the sort itself is nonempty.

4. Delimiters. MM0 has a tokenizer that is slightly more complicated than split(' '). But adding more spaces for disambiguation is totally okay, so I am inclined to solve this by simply adding spaces around token sequences until it matches MM conventions.

5. Notations. MM0 has a precedence grammar that allows you to avoid a lot of parentheses. MM only allows the encoding of pure CFGs.

  * Option 1: MM0's grammar is also a CFG, so use that encoding. This is a strawman because the grammar has thousands of nonterminals (one for each precedence level), and to declare this in MM would need one sort for every precedence level (or at least every precedence level in use, which for peano is probably at least 15-20 levels). We would also need a coercion from each lower precedence level to the next higher level, meaning that terms would be bloated with towers of meaningless coercion functions. (I've actually done this sort of thing in metamath before; dtt.mm contains a precedence grammar so that lambdas and applications don't need unnecessary parentheses. Even there you need at least three levels and it is not at all fun to write manually.)

  * Option 2: Live with the extra parentheses. This is idiomatic MM. But still it leaves open the question of how to determine when to insert parentheses. Every MM sort corresponds to one precedence level, so functions like "+" with type nat -> nat -> nat are not precedence changing and hence require parentheses, while functions like "=": nat -> nat -> wff are potentially precedence changing and require parentheses only if the prec(nat) <= prec(wff). It is not clear how to assign precedences to sorts based on the available precedence information in MM0, though (which is associated with term constructors, not sorts). Perhaps this also needs to be provided as configuration. (In set.mm it's pretty simple: there are only two expression sorts, and prec(wff) < prec(class). In peano.mm0 I'm not really sure what would be appropriate, perhaps prec(wff) < prec(nat) = prec(set).)

  * Option 3: Forget about the analysis and just parenthesize everything. This has the advantage of being easy to implement and also easy to prove unambiguous, compared to option 2.

6. Tactics. This is more of an MM1 feature than MM0, but it will be a shame to lose all the tactic scripts. In particular this information might be useful for using MM1 as a MM IDE by means of continuous translation. I already much prefer MM1 to mmj2 for writing metamath style proofs, and especially if MM1 can work in a semi-translated mode where it is compatible in a limited way with pure metamath theorems, you might be able to use MM1 for set.mm development.

  * Option 1: Live with it, drop the tactics, metamath was never designed for this. This is the more idiomatic option, but it may end up with difficult to maintain MM proofs. I don't think this is a very big problem for the existing peano.mm1 proofs, because they have been written with a mind toward efficient MM output, with the exception that peano.mm1 uses ax_mp a lot more than MM normally would, and the *i "inference form" theorems are mostly absent. But a minimize pass should clean this up. In the future, especially if people other than me start writing tactics, it is likely that much more inefficient proofs will start popping up. (I recognize that I care about proof length a bit more than is rational.)

  * Option 2: Put the proof scripts in comments next to the metamath proof, possibly with some sigils so that they can be automatically rediscovered by tooling. This solves the problem of recovering the tactic that produced the proof, although it is also likely to go out of date if it is a proof script for a set.mm proof and people who don't use MM0 subsequently change the proof without changing the comment. It could be checked in CI but this puts another layer of complexity in MM checking that arguably shouldn't be present if it's actually to be a translation.

  * Option 3: Try to reconstruct a MM1 proof script from a MM proof object. This sounds really hard to me, particularly when it comes to identifying where tactics have been called, as they don't really leave any trace in the term.

  * Option 4: Add comments *inside* a translated MM proof saying where tactics were called. The most advanced version of this might require a change to the MM spec, but we can probably get away with adding comments like "ABAZBABC $( tac1 $) DFAZAABZ" in the compressed proof block.

7. Do blocks. Similarly, the definition of tactics themselves could be translated, although at this point there is really not much to say on the MM side besides "here's some MM1 code, I hope it means something to you because I sure don't know what it is". In MM1 do blocks can also create theorems and this is sort of the pinnacle of automated proof generation; it is very hard to represent the provenance of these proofs in any meaningful way. (For example, in peano_hex.mm1 I have implemented the arithmetic algorithm in hexadecimal, and this requires an addition and multiplication table for all numbers 0-15, so about 256 theorems for each operation. These theorems, things like 7p4e11, are all being generated automatically.)


Part 2: set.mm

I think it is fair to say that for many of the results that are being proven in peano.mm1 and all the knock on CS projects that will be proven with respect to PA, it would be nice for soundness purposes to have a translation that targets not some peano.mm database that is still PA, but rather set.mm. That would allow things like the MM0 formalization and the MMC verification framework to also exist inside set.mm. But this comes with its own, additional challenges.

1. Sorts. peano.mm0 has three sorts: wff, nat, and set. These represent booleans, natural numbers, and the powerset of natural numbers respectively, rather like set.mm's set and class distinction but for finite set theory. The obvious and idiomatic thing is to map wff to wff, nat to NN0, and set to ~P NN0, but this leads to an obvious problem: what it means to map a sort to a set.

Whenever we have a theorem with a nat variable 'a', we need to replace that with a class variable 'A' and an assumption that 'A e. NN0'. Where do we put this assumption?

  * Option 1: Add the assumption in a hypothesis. This is the most faithful representation of the original theorem but it limits its applicability unless we use dedth.
  * Option 2: Change the theorem to deduction form and then add a hypothesis under the new context. This is simple-ish to implement, but it does mean that some theorems will have two contexts which is not great.
  * Option 3: Try to identify if the theorem is already in deduction form and then use that context. More complicated, and also obviously requires the translator to know about how deduction proofs are done in both systems (although for a translation at this level of faithfulness between specifically peano.mm0 + stuff to set.mm + stuff that's not unreasonable).
  * Option 4: Add the assumptions as antecedents on the theorem. I'm not a fan of this option because it requires knowing all the syl*anc theorems and also messes up the statements of theorems.

Also, in many cases we can deduce the typing condition from another typing condition in the MM0 theorem. For example MM0 itself has sets and uses them for explicit typing; if we have (a: nat) and an assumption h: $ G -> a e. (0 .. 16) $ we might want to not generate two typing assumptions on 'a', since 'A e. ( 0 ..^ 16 )' is already sufficient to derive that A e. NN0. We can probably assert this in general for anything that looks like a set in peano.mm0, because if the sort 'set' is interpreted as ~P NN0 then "a e. A" always implies that 'a' is in NN0 because A is syntactically restricted to be a subset of NN0. Of course we have to do the same thing for 'set', and here we might prefer to use "A C_ NN0" instead of "A e. ~P NN0" for being more idiomatic. Again, lots of special case reasoning.

2. Alignment. peano.mm0 defines a lot of basic functions, like addition, multiplication, division, the cantor pairing function, and so on. Some of these have direct analogues, for example we can more or less directly translate "a + b" to "( A + B )", using that + is closed in NN0 for the typing theorem. Others don't: "a - b" in peano.mm0 is truncated subtraction, which is not defined in set.mm (IIRC) so we would have to create a new function for this or else litter the output with things like if ( B <_ A , ( A - B ) , 0 ) that could grow the statement exponentially. It's a shame that this can't be translated to ( A - B ) directly but unless the theorem has a B <_ A guard in it this probably can't be done.

Other functions exist but have more complicated lowerings. For example peano.mm0 has the 'the' operator for definite description, which has the type set -> nat, while set.mm has "iota" instead, which is a binder. So that means that we would want to translate "the A" to "iota_ x e. NN0 x e. A" and "the {x | p}" to "iota_ x e. NN0 p", and this binding mismatch would translate to the theorems as well. This problem can be ignored though if we just want any translation and don't go for maximum fidelity.

---

That covers the issues I can currently see with the translation process. I'm curious to see how people feel about the merits of the different options, what would be the most helpful. Many of the points above also apply to MM -> MM translations, for example type guards and alignment also occur when translating peano.mm to set.mm or hol.mm to set.mm. It would be nice if we had a good story around interpreting MM databases into abstract (set-theoretic?) models and from there into reasonable MM databases like set.mm.

Mario Carneiro

Gino Giotto

unread,
May 25, 2025, 5:10:53 PMMay 25
to Metamath
I'm interested in this. In particular, I would like to use the tactics features of MM1 to create MM proofs. Unfortunately, I'm far from being proficient enough to provide good answers to the challenges expressed above. I watched the tutorial and have MM0 installed in VS Code, but so far I've only used MM1 to create a few test proofs to get an intuition of its functionalities. Having a translator that can go back and forth between MM and MM1 would give me a strong reason to dive deeper into it, helping me better understand the world of MM0/MM1 and potentially produce proofs more quickly.

A general suggestion regarding the above challenges is that the translator would ideally preserve the original MM file. That is, if I translate set.mm into set.mm1, and then translate back from set.mm1 into set.mm, the resulting MM file would preferably be identical to the original, except for the parts that I have manually edited. Not sure if this is feasable tho.

Regarding point 6 of part 1, I would choose option 1: "Live with it, drop the tactics, metamath was never designed for this". I'm interested in using MM1 tactics to create MM proofs, but I'm not interested in preserving the MM1 tactics themselves. A "fringe idea" could be to export those MM1 tactics as a separate Rumm file, but Rumm is currently very primitive and not under active development (despite my efforts), so I don't think it's worth getting too hung up on it.

Perhaps, instead of translating the entire database, one could extract only the portion of set.mm that the user intends to revise. In metamath.exe, there is "write source <output-name> /extract <theorem>" that allows the user to create a MM file containing only the material needed to prove a specified theorem. A similar approach could be used to derive a smaller MM1 file from MM.

It would be really cool to have a "flawless" translation between MM and MM1 because it would essentially be equivalent to having tactics directly in Metamath, which I think would provide a significant gain in the long term (tho, I'm not sure whether my aim aligns with the constraints of the MM0/MM1 language).

Mario Carneiro

unread,
May 25, 2025, 5:56:36 PMMay 25
to meta...@googlegroups.com
Another possibility which I did not mention in that post is to add constraints to MM1 (or specifically mm0-rs) to allow it to operate in "MM mode", where certain things are disallowed to make translation easier. That makes more sense if your goal is to use MM1 to develop set.mm rather than embedding peano.mm1 et al into set.mm.

A less ambitious goal which seems practically doable would be to allow MM1 to import MM (this is another version of MM -> MM0 translation which already exists), and then just produce small scale MM1 -> MM translations like single theorems, using an approach similar to mmj2 or rumm where you get some code blocks to copy into set.mm. If all the theorem statements are written on the MM side then this solves most of the issues about variable naming etc.

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/metamath/4654b1d5-cb5f-4f77-ada6-0c6a166b80a0n%40googlegroups.com.

Gino Giotto

unread,
May 25, 2025, 8:10:25 PMMay 25
to Metamath
>  A less ambitious goal which seems practically doable would be to allow MM1 to import MM (this is another version of MM -> MM0 translation which already exists), and then just produce small scale MM1 -> MM translations like single theorems, using an approach similar to mmj2 or rumm where you get some code blocks to copy into set.mm. If all the theorem statements are written on the MM side then this solves most of the issues about variable naming etc.

This works for me. Only concern: when you mention "another version of MM -> MM0 translation which already exists" I think you are referring to the Haskell translator that converts a MM file into a MM0 or MMU file. I tried that command, but the statements of the produced file are represented as trees, like this:

--| The value of the divisor function at a prime power. (Contributed by Mario Carneiro, 17-May-2016.)
theorem sgmppw {k: setvar} (A2 P N: class):
  $ wi (w3a (wcel A2 cc) (wcel P cprime) (wcel N cn0)) (wceq (co A2 (co P N cexp) csgm) (csu k (co cc0 N cfz) (co (co P A2 ccxp) (cv k) cexp))) $;

The above statement looks really hard to read for me, so I'm ok with the latter proposal if I can work with the original set.mm notation in MM1 (or at least something similar). Looking at the set.mm0 file in the examples it seems it should be possible https://github.com/digama0/mm0/blob/master/examples/set.mm0, but maybe it requires more work?

>  Another possibility which I did not mention in that post is to add constraints to MM1 (or specifically mm0-rs) to allow it to operate in "MM mode", where certain things are disallowed to make translation easier. That makes more sense if your goal is to use MM1 to develop set.mm rather than embedding peano.mm1 et al into set.mm.

What would the user get compared to the other approach? (Others might be interested, though for me, the "less ambitious goal" would be good already.)

Mario Carneiro

unread,
May 25, 2025, 8:18:47 PMMay 25
to meta...@googlegroups.com
On Mon, May 26, 2025 at 2:10 AM Gino Giotto <ginogiott...@gmail.com> wrote:
This works for me. Only concern: when you mention "another version of MM -> MM0 translation which already exists" I think you are referring to the Haskell translator that converts a MM file into a MM0 or MMU file. I tried that command, but the statements of the produced file are represented as trees, like this:

--| The value of the divisor function at a prime power. (Contributed by Mario Carneiro, 17-May-2016.)
theorem sgmppw {k: setvar} (A2 P N: class):
  $ wi (w3a (wcel A2 cc) (wcel P cprime) (wcel N cn0)) (wceq (co A2 (co P N cexp) csgm) (csu k (co cc0 N cfz) (co (co P A2 ccxp) (cv k) cexp))) $;

The above statement looks really hard to read for me, so I'm ok with the latter proposal if I can work with the original set.mm notation in MM1 (or at least something similar). Looking at the set.mm0 file in the examples it seems it should be possible https://github.com/digama0/mm0/blob/master/examples/set.mm0, but maybe it requires more work?

The issue is that set.mm notations are not compatible with the mm0 notation system, and the current translation makes no attempt to translate notations. This can be done in at least some cases automatically, but it seems like a lot of work to tune the heuristics...

>  Another possibility which I did not mention in that post is to add constraints to MM1 (or specifically mm0-rs) to allow it to operate in "MM mode", where certain things are disallowed to make translation easier. That makes more sense if your goal is to use MM1 to develop set.mm rather than embedding peano.mm1 et al into set.mm.

What would the user get compared to the other approach? (Others might be interested, though for me, the "less ambitious goal" would be good already.)

Well in any case you will want to either avoid things that MM is bad at. You also need to store alignment information somewhere no matter how you go about it.

Gino Giotto

unread,
May 25, 2025, 9:29:50 PMMay 25
to meta...@googlegroups.com
The issue is that set.mm notations are not compatible with the mm0 notation system, and the current translation makes no attempt to translate notations. This can be done in at least some cases automatically, but it seems like a lot of work to tune the heuristics...

Hmm, all I can say is that I could manually add the notations that cannot be done automatically if they are not too many, otherwise I don't think I can handle the tree format. I guess others would benefit as well if this would have to be done only once.

I don't know the language very well. Is it just about creating many lines like "infixl wa: $/\$ prec 20;" or "prefix wex: $E.$ prec 30;" and then the statements will be automatically prettified from that information? I'm concerned that things will break with future set.mm updates tho.

Mario Carneiro

unread,
May 25, 2025, 11:47:18 PMMay 25
to meta...@googlegroups.com
You can do that, but ideally this would be done as a change to the exporter, so that the theorems themselves can be stated using the notation, and also so that it stays up to date with set.mm. So the exporter will have to identify whether a notation is prefix or infix or a general notation, and there will be compromises needed on some set.mm notations that are mm0-ambiguous, like wa `( ph /\ ps )` and w3a `( ph /\ ps /\ th )`. In MM0 every notation has to start with a unique token.

--
You received this message because you are subscribed to the Google Groups "Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email to metamath+u...@googlegroups.com.

Gino Giotto

unread,
May 26, 2025, 10:54:53 AMMay 26
to Metamath
> You can do that, but ideally this would be done as a change to the exporter, so that the theorems themselves can be stated using the notation, and also so that it stays up to date with set.mm. So the exporter will have to identify whether a notation is prefix or infix or a general notation, and there will be compromises needed on some set.mm notations that are mm0-ambiguous, like wa `( ph /\ ps )` and w3a `( ph /\ ps /\ th )`. In MM0 every notation has to start with a unique token.

Ok, this sounds good to me. I will probably need one example to see how it works, but after that I should be able to figure out how to carry on with the notation. For w3a, one could disambiguate using a slightly different symbol, not already present in set.mm, like `( ph /\. ps /\. th )`. A problem with this approach is that someone might add a definition in set.mm with the `/\.` symbol in the future and then the translation would break because of a conflict between tokens. One could avoid it by using special hidden symbols, but this is probably against the transparency philosophy of MM0, so I'm sure there are plenty of reasons why this is bad.

Another possibility could be to tell the exporter to automatically replace `(w3a ph ps ch)` with `(wa (wa ph ps) ch)` everywhere before introducing the notation. The original set.mm would not be preserved, but since the theorem being edited in MM1 would verify anyway in the original set.mm, I think it would not be a problem.  

There are also df-op `<. A , B >.` and df-ot `<. A , B , C >.` that have the same obstacle. Dropping df-ot might be ok, but this approach becomes problematic for definitions like df-s1, df-s2, df-s3, df-s4, etc.. which really make statements about words shorter. I think dropping those might not be acceptable.  

I guess the easy ones to automate are class constants. Looking at peano.mm0, I notice that (all?) constants are defined as prefix operators with maximum precedence (e.g. def d1:  nat = $suc 0$; prefix d1:  $1$  prec max;). If that's the case, then it seems feasible to make a complete manual list of problematic definitions and decide how to handle them. The vast majority of set.mm definitions are just class symbols all different between each other.

Jim Kingdon

unread,
May 26, 2025, 12:22:48 PMMay 26
to meta...@googlegroups.com
On 5/26/25 07:54, Gino Giotto wrote:

> For w3a, one could disambiguate using a slightly different symbol, not
> already present in set.mm, like `( ph /\. ps /\. th )`.

That would be my instinct. I suppose if we find ourselves encountering
the situation where set.mm later defines /\. we could build some tooling
around declaring a token as reserved (and therefore not definable) but
I'm not sure I'd go to the trouble unless we find this happening often
enough to be annoying.

> Another possibility could be to tell the exporter to automatically
> replace `(w3a ph ps ch)` with `(wa (wa ph ps) ch)` everywhere before
> introducing the notation. The original set.mm would not be preserved
>
If we are willing to give up round-tripping back to set.mm this might, I
suppose, work better just on the MM0/MM1 side (not that I know enough to
really say).

A hybrid approach would translate to `(wa (wa ph ps) ch)` with some kind
of annotation to enable round tripping but when I think of designing
that sort of annotation mechanism I'm not sure it ends up really
different from the /\. solution.

I haven't been close enough to MM0/MM1 to have a fully fleshed out
opinion about how to handle these issues, but I hope that it isn't
preventing people from doing things with MM0/MM1 and set.mm. At least
from a distance these issues look solvable to me.

> There are also df-op `<. A , B >.` and df-ot `<. A , B , C >.` that
> have the same obstacle. Dropping df-ot might be ok, but this approach
> becomes problematic for definitions like df-s1, df-s2, df-s3, df-s4,
> etc.. which really make statements about words shorter. I think
> dropping those might not be acceptable.

ot is less used than 3an and 3or but I guess my mind first goes to
automatically translating between the metamath notation and something
which is better behaved in the ways we have been discussing. Maybe I'm
just used to the metamath notation but given that it seems like it could
be automatically translated, it doesn't seem like a major obstacle.

Now, I can think of other reasons why 3an and 3or might not be so great
(basically the overhead of having both an and 3an when they do the same
thing), but I don't know if I'd decide it based on notation.


Mario Carneiro

unread,
May 26, 2025, 4:55:28 PMMay 26
to meta...@googlegroups.com
On Mon, May 26, 2025 at 4:54 PM Gino Giotto <ginogiott...@gmail.com> wrote:
> You can do that, but ideally this would be done as a change to the exporter, so that the theorems themselves can be stated using the notation, and also so that it stays up to date with set.mm. So the exporter will have to identify whether a notation is prefix or infix or a general notation, and there will be compromises needed on some set.mm notations that are mm0-ambiguous, like wa `( ph /\ ps )` and w3a `( ph /\ ps /\ th )`. In MM0 every notation has to start with a unique token.

Ok, this sounds good to me. I will probably need one example to see how it works, but after that I should be able to figure out how to carry on with the notation. For w3a, one could disambiguate using a slightly different symbol, not already present in set.mm, like `( ph /\. ps /\. th )`. A problem with this approach is that someone might add a definition in set.mm with the `/\.` symbol in the future and then the translation would break because of a conflict between tokens. One could avoid it by using special hidden symbols, but this is probably against the transparency philosophy of MM0, so I'm sure there are plenty of reasons why this is bad.

The translator already has to do things like this when renaming theorems, since MM allows dots, hyphens, and underscores in labels while only underscores are legal in MM0, leading to potential name clashes. Currently it will resolve overlaps by just appending a sequence number, but it means you need to have the context of what has been translated so far to perform back translation reliably. You could have a more sophisticated scheme that is actually injective, but I think it would make the common case look worse.
 
Another possibility could be to tell the exporter to automatically replace `(w3a ph ps ch)` with `(wa (wa ph ps) ch)` everywhere before introducing the notation. The original set.mm would not be preserved, but since the theorem being edited in MM1 would verify anyway in the original set.mm, I think it would not be a problem.  

This is definitely going to result in more idiomatic results on the MM0 side, at the loss of injectivity.
 
There are also df-op `<. A , B >.` and df-ot `<. A , B , C >.` that have the same obstacle. Dropping df-ot might be ok, but this approach becomes problematic for definitions like df-s1, df-s2, df-s3, df-s4, etc.. which really make statements about words shorter. I think dropping those might not be acceptable.  

Currently, the nearest case of this occurring in MM1 is in lists and (hexa)decimal numbers, and in this case the terms as written are just stacks of "cons" operations like `a : b : c : d : 0`. (Not having to put parentheses in this expression makes it much more palatable than the MM equivalent.) In the case of numbers, there is also a special parser written as a tactic which allows writing `,1234` which gets parsed into `x4 :x xd :x x2` where `a :x b` is the infix operator 16a + b. The same mechanism can parse string constants so you can write `,"hello"` and it becomes `ch x6 x8 ': ch x6 x5 ': ch x6 xc ': ch x6 xc ': ch x6 xf ': s0` which is the encoding of that string as a sequence of ASCII bytes.

That is all to say that if it were a native MM1 issue, I would be inclined to solve it using a custom parser and keep the terms simple. But for representing MM terms faithfully, the easy solution would be to add <"1 ... <"8 constants, or maybe if you want to get clever, <', <", <"', <"" and then <"5, ... <"8. (For the closing bracket, there is no requirement that it be unique, you can use "> for all of them or bracket match or whatever.)
 
I guess the easy ones to automate are class constants. Looking at peano.mm0, I notice that (all?) constants are defined as prefix operators with maximum precedence (e.g. def d1:  nat = $suc 0$; prefix d1:  $1$  prec max;). If that's the case, then it seems feasible to make a complete manual list of problematic definitions and decide how to handle them. The vast majority of set.mm definitions are just class symbols all different between each other.

I don't know whether this is a good idea, but for class constants there is another approach, which is to just not have a notation at all, since unlike metamath, mm0 lets you choose to use notation or not for any constant. So if it's a constant with a name which is a valid label (like Grp) you can just do `def Grp = $ ... $;` and then use `Grp` directly in later statements. The examples with numbers are only there because numbers are not valid labels (in MM0; MM lets you have names including initial numbers and dots leading to things like the amusingly named 0.999...). But in constant symbols in notations you can use almost any sequence of characters except for spaces and dollar signs, which basically coincides with MM token rules.
 


On Mon, May 26, 2025 at 6:22 PM Jim Kingdon <kin...@panix.com> wrote:
A hybrid approach would translate to `(wa (wa ph ps) ch)` with some kind
of annotation to enable round tripping but when I think of designing
that sort of annotation mechanism I'm not sure it ends up really
different from the /\. solution.

You could also not have an annotation and just opportunistically use it when you think it will look better by some kind of analysis. This gets really complicated really fast though, especially since you need to patch up the proofs to make sure they remain coherent if you change your mind about the interpretation of a subterm midway through a proof.
 
I haven't been close enough to MM0/MM1 to have a fully fleshed out
opinion about how to handle these issues, but I hope that it isn't
preventing people from doing things with MM0/MM1 and set.mm. At least
from a distance these issues look solvable to me.

Well, from the timeline on this thread, the real issue is that no one actually has time to do the work. :) Or at least, I don't have time and most others aren't or aren't yet in a position to do it themselves.
 
> There are also df-op `<. A , B >.` and df-ot `<. A , B , C >.` that
> have the same obstacle. Dropping df-ot might be ok, but this approach
> becomes problematic for definitions like df-s1, df-s2, df-s3, df-s4,
> etc.. which really make statements about words shorter. I think
> dropping those might not be acceptable.

ot is less used than 3an and 3or but I guess my mind first goes to
automatically translating between the metamath notation and something
which is better behaved in the ways we have been discussing. Maybe I'm
just used to the metamath notation but given that it seems like it could
be automatically translated, it doesn't seem like a major obstacle.

In MM0 the pair syntax is `a, b`, and this generalizes fairly well to longer tuples as `a, b, c, d` (although note it's right associated unlike metamath, because this behaves better when defining things inductively). This is actually super useful, I make use of it all the time to have the equivalent of df-opr, df-mpo and similar things without having to make N versions of it for every size of ordered N-tuple, like this (which is defining tupled 5-ary function) or this (a 6-ary class-valued function).

Now, I can think of other reasons why 3an and 3or might not be so great
(basically the overhead of having both an and 3an when they do the same
thing), but I don't know if I'd decide it based on notation.

Indeed I think this is one of the failings of metamath, it ties syntax and semantics a bit *too* strongly, meaning that people will make definitions to improve visual display. df-3an in particular I think does not pay for itself, it instead generates a huge amount of combinatorial growth of spellings for the simpl* and ad*ant* family. (df-3or I think doesn't matter because no one ever uses it.) If you can declare a notation to say that two ands in a particular arrangement can be written without parentheses, you never need to have df-3an in the first place.

Mario Carneiro

unread,
May 26, 2025, 5:05:06 PMMay 26
to meta...@googlegroups.com
By the way, I should mention that I've been planning to rewrite the MM -> MM0 translation to rust to integrate better with mm0-rs for a while now, and it might be worth experimenting with some of the things in this thread while doing the rewrite. The intended interface is that you can use `import "foo.mm";` at the top of a MM1 file and it will convert the mm to mm0 on the fly and give you an environment to build on. (You can already do this for mmu and mmb files.)
Reply all
Reply to author
Forward
0 new messages