Hi everybody (especially Mario, given the subject!),
I am working on another MM0 implementation, and I have a few doubts
about the secondary parser (parsing in general is not my strong point).
It is said that, in order to have uniqueness, an infixy constant should
have a unique precedence. Does this mean that it is unique among the
infixy constants' precedences, or among all the precedences?
set.mm0 suggests the former, because both $es.$ (infixl) and $~$ (not
infixy) have precedence 40. However, it seems to me that this is enough
to lose uniqueness of parsing tree: for example, $~ x es. y$ can be
parsed both as $~ (x es. y)$ and as $(~ x) es. y$, according to the
grammar in mm0.md. The latter had wrong types, but I believe this is not
something the parser should care about (or should it?).
Hi,
Il 07/11/19 03:18, Mario Carneiro ha scritto:
> Neither; it means that any time a particular infixy constant appears, it
> must have the same precedence. That is, if you have a notation x $e.$ A
> and another notation ${$ x $e.$ A $|$ ph $}$, then $e.$ must be given
> the same precedence in both appearances.
Ok, so the reason for this is that when you found a constant while
parsing at a certain minimum precedence level, you immediately know if
you have to branch out or not, without having to know if the surrounding
constructor is an infix or a notation? Or is there something deeper?
What do the colons and digits mean in " ((A:2) -> (B:1) : 1) " ?
To have unambiguity, wouldn't it be easier, as Giovanni thought was the case, that any natural number be the precedence of at most one operator (even if this is more than needed for unambiguity) ?
How do you parse " a * b + c " if both laws have the same precedence ?
What do the colons and digits mean in " ((A:2) -> (B:1) : 1) " ?They are annotating the precedence levels of subterms. It's basically a shorthand for the CFG ruleexpr(1) <- expr(2) "->" expr(1)where "expr(1)" is the nonterminal corresponding to expressions with precedence >= 1.
How do you parse " a * b + c " if both laws have the same precedence ?It is also generally agreed that * has higher precedence than + , so a * b + c means (a * b) + c and a + b * c means a + (b * c).
In you example, you definitely want "." to be right associative, so that
a . b . s = a . (b . s) as you say.
You also want * to be higher
precedence, because you probably want a * b . s = (a * b) . s.
Thanks Giovanni. I hope Mario can also chime in on the questions in my previous message, since these are not right-or-wrong issues, but are more about balancing pros (proximity with mathematical practice) and cons (more complex parser).
But I still have a question: is this common practice in computer science? It is certainly not in mathematics. It is actually a bit strange from a mathematical-practice point of view. For instance, if I want to define an R-module M where * is the field multiplication and . is the scalar multiplication (so for instance, if a, b \in R and s \in M, I have (a * b) . s = a . ( b . s ) \in M), I would have to type, e.g.:
infix * rassoc precedence=1; (or "lassoc")
infix . rassoc precedence=2;
and then one of the axioms for an algebra would read "a . b . s = a * b . s". By the way: it is necessary to have different precedences, in this example. Is this correct?
But it seems strange to specify that an operator of signature "scalar x vector -> vector" is either left- or right-associative. Since vector is not a subtype of scalar, I should not have to specify that, and "a . b . s" can only be read as "a . ( b . s )". If I had typed "infix . lassoc precedence=2;" above, would this have triggered an error? (Another example is inner products, since "scalar" is not a subtype of "vector".) Or would this require a too complex parser?
In the example above, I would much rather have to type, e.g.:
infix * rassoc precedence=1;
infix . precedence=2;
and that if I had added "rassoc" or "lassoc" in the second line, this would have triggered an error.
You write:In you example, you definitely want "." to be right associative, so that
a . b . s = a . (b . s) as you say.
Not only do I want it, but it is the only possible parse, for type reasons. Would it have triggered an error to type " infix . lassoc prec=1 " ? Or would it trigger an error only when I write "a . b . s" without parentheses ? (In any case, this would be unfortunate.)
You also want * to be higher
precedence, because you probably want a * b . s = (a * b) . s.
Idem: it is the only possible parse for type reasons. Would it have triggered an error to give higher precedence to the scalar multiplication ? Or would it trigger an error only if I write "a * b . s" without parentheses ? Here, I would be ok to have to always type "(a * b) . s" and to be forbidden to type "a * b . s". In other words, is it possible to define operation associativities and precedences which require parentheses to avoid ambiguity, without triggering errors at operation definitions ? Allowing this might indeed require a too complex parser, and this is probably not MM0's job (which should probably allow minimal syntactic sugar as long as this does not complexify things too much).
Side remark: I agree with the treatment of prefixed operators, which looks like the most sensible one. Is there a reason why suffixed operators are not allowed in MM0 ? This sounds unfortunate since they are often used in math (inverse, transpose, conjugate...).
Another remark: if one retains, for simplicity of the parser (and neglecting proximity with mathematical practice), that a given precedence is associated with either left- or right-associative operations, one could even impose from the beginning that even precedence implies right-associativity and odd precedence implies left-associativity, hence doing away with the keywords "lassoc" and "rassoc".
It may seem strange that you have to redundantly specify this information, but it's not really redundant. How can MM0 know that scalar is not a subtype of vector? It may not be right now, but perhaps later you decide to add a coercion and then it becomes ambiguous.
The MM0 parser specification is not deterministic
Finally, I want to point out that your example is slightly unrealistic. The MM0 type system is (somewhat deliberately) impoverished, and it would be very unlikely for you to actually have types "scalar" and "vector" unless you are in a (very limited) kind of first order proof environment for modules. You wouldn't be able to do much interesting vector space theory there, because you have no higher order constructs. Much more likely, you are working in a ZFC style foundation where everything has the same type "set", and then type information is next to useless for this kind of thing.
I will add that I fell into this (cognitive) trap myself, and Giovanni caught me on it. I said to myself "It doesn't make any sense for "e." to be right or left associative, because it has type set x set -> wff, so it can't associate with itself anyway, so I'll just make it left assoc". But this is not true, because "~" lives at the same precedence, and if "e." is left assoc then there is a conflict, while if "e." was right assoc then there would be no conflict. Because *types don't matter during parsing* (at least, with the MM0 parser algorithm as it currently exists). I've written a simple C compiler before, and the "A * b;" example of type-disambiguation between a multiplication statement and a pointer declaration is a real thing (and a real pain). I don't want to replicate that.
I suppose that's true, but it's not obvious. MM0 is designed to be read, and having mystery rules like that seems like a good way to scare people off. The precedence system is supposed to be minimal and expressive, but not surprising, and I can already see the angry github issues asking why changing a precedence from 42 to 43 with nothing else nearby breaks their code.
Regarding the barb about neglecting mathematical practice, I still don't see your argument. Your example with modules fits perfectly well within the MM0 precedence system. Yes you have to write those precedences down, and if you get it wrong then things don't typecheck, but that doesn't mean that "." isn't right associative in the example. It's *obviously* right associative because the other option doesn't typecheck, but you seem to be taking from this fact that it's not associative at all, which isn't true; a non-associative operator (which MM0 doesn't support) would *always* require parentheses.
Thanks for the clear explanations. I didn't know this use of the word "barb", and I see on dictionary.com "unpleasant remark". My remarks were certainly not meant to be unpleasant, and I'm interested in seeing where MM0 goes.
Regarding the barb about neglecting mathematical practice, I still don't see your argument. Your example with modules fits perfectly well within the MM0 precedence system. Yes you have to write those precedences down, and if you get it wrong then things don't typecheck, but that doesn't mean that "." isn't right associative in the example. It's *obviously* right associative because the other option doesn't typecheck, but you seem to be taking from this fact that it's not associative at all, which isn't true; a non-associative operator (which MM0 doesn't support) would *always* require parentheses.
Thanks for the clear explanations. I didn't know this use of the word "barb", and I see on dictionary.com "unpleasant remark". My remarks were certainly not meant to be unpleasant, and I'm interested in seeing where MM0 goes.
Is it always possible to put parentheses, even where they are not needed? I mean: is there a rule saying that if A is an expression of some type, then ( A ) is also an expression of the same type, so that I can write e.g. "( a * b ) + c" even when "a * b + c" is valid?
It may seem strange that you have to redundantly specify this information, but it's not really redundant. How can MM0 know that scalar is not a subtype of vector? It may not be right now, but perhaps later you decide to add a coercion and then it becomes ambiguous.Do you mean: changing one's mind at a later time, or putting a coercion later in the file? The latter could be forbidden, maybe.
The MM0 parser specification is not deterministicWould it be possible (and worth it in terms of pros and cons) to make it deterministic ?
Finally, I want to point out that your example is slightly unrealistic. The MM0 type system is (somewhat deliberately) impoverished, and it would be very unlikely for you to actually have types "scalar" and "vector" unless you are in a (very limited) kind of first order proof environment for modules. You wouldn't be able to do much interesting vector space theory there, because you have no higher order constructs. Much more likely, you are working in a ZFC style foundation where everything has the same type "set", and then type information is next to useless for this kind of thing.Indeed, and that's a good point, and this was for the sake of the example. There might be other applications of MM0 not based on set theory, towards type theories, where there could be real-life examples like this.
I will add that I fell into this (cognitive) trap myself, and Giovanni caught me on it. I said to myself "It doesn't make any sense for "e." to be right or left associative, because it has type set x set -> wff, so it can't associate with itself anyway, so I'll just make it left assoc". But this is not true, because "~" lives at the same precedence, and if "e." is left assoc then there is a conflict, while if "e." was right assoc then there would be no conflict. Because *types don't matter during parsing* (at least, with the MM0 parser algorithm as it currently exists). I've written a simple C compiler before, and the "A * b;" example of type-disambiguation between a multiplication statement and a pointer declaration is a real thing (and a real pain). I don't want to replicate that.That's a clear explanation. But in the present case, I would say that a solution which is better for human comprehension is to give lower precedence to "~".
Regarding the barb about neglecting mathematical practice, I still don't see your argument. Your example with modules fits perfectly well within the MM0 precedence system. Yes you have to write those precedences down, and if you get it wrong then things don't typecheck, but that doesn't mean that "." isn't right associative in the example. It's *obviously* right associative because the other option doesn't typecheck, but you seem to be taking from this fact that it's not associative at all, which isn't true; a non-associative operator (which MM0 doesn't support) would *always* require parentheses.
No: I'm taking from this fact that I should not have to specify its associativity since it is obviously right-associative. But I agree that this puts too much work on the parser (if it's even possible, as you said).
I mean putting a coercion later in the file. It could be forbidden, but then this entails making precise the rules that dictate the acceptability of the added coercion (and an efficient decision procedure for the criterion). There is already a similar kind of restriction on coercions: the coercion graph is required to have no diamonds (more than one path between any two fixed vertices), and this is checked using a dynamic version of the Floyd-Warshall algorithm.
The MM0 parser specification is not deterministicWould it be possible (and worth it in terms of pros and cons) to make it deterministic ?Perhaps this is the wrong way to say it. The parser itself is deterministic, because the CFG is unambiguous. There is only one parse according to the CFG, so any valid parse algorithm will get the same result.
I hope parts of this discussion will make their way into some kind of documentation, in particular the possibility to add extra parentheses
and the fact that MM0 uses the term "sort" instead of "type" because the type system is voluntarily poor, and if some type theory is to be implemented, this has to be within the logic, and not the metalogic.
I mean putting a coercion later in the file. It could be forbidden, but then this entails making precise the rules that dictate the acceptability of the added coercion (and an efficient decision procedure for the criterion). There is already a similar kind of restriction on coercions: the coercion graph is required to have no diamonds (more than one path between any two fixed vertices), and this is checked using a dynamic version of the Floyd-Warshall algorithm.Maybe this type of "delayed coercions" could be forbidden, but I cannot tell whether it is worth it (nor could I tell the precise rules and decision procedure).
The MM0 parser specification is not deterministicWould it be possible (and worth it in terms of pros and cons) to make it deterministic ?Perhaps this is the wrong way to say it. The parser itself is deterministic, because the CFG is unambiguous. There is only one parse according to the CFG, so any valid parse algorithm will get the same result.This is what I understood. I was referring to making the CFG deterministic, in the sense of deterministic CFGs as a proper subset of unambiguous CFGs, which allow for simpler parsing (https://en.wikipedia.org/wiki/Deterministic_context-free_grammar). But again, I don't know much in this domain.