Hey John,
Sorry for the slow response, I’m afraid you emailed me as I was returning from a visit to my brother’s family in Taiwan, and it has taken me a bit of time to recover enough from the 12-hour time change that I’ve still got a little energy to answer emails when I get home :)
You are correct that the nanopass-framework (both Scheme and Racket) does not support bare symbols. On the one hand, bare symbols are also used for terminal and non-terminal references, so supporting them produces the potential for collision, though I think it is probably not a big deal to support them, with the possible hazard of programmers accidentally selecting the name of a terminal or nonterminal and one of the symbols they want to use to be the same. (This danger already exists in the leading keyword term, which could be either a keyword or a terminal or nonterminal reference, and I have occasionally shot myself in the foot with this, and it usually isn’t too bad.)
That said, it would be some work to support it, and I’d want to also support things that do not have a keyword prefix, which is also not supported (for instance you cannot have “bindings” production that looks like: ([var expr] …), because the nanopass framework looks for a leading symbol (keyword, terminal reference, or nonterminal reference) to figure out what the name of the produced record/struct is. Anyway, not impossible to support, but a bit of work.
To your question:
There are a couple of downsides to the unary production approach in addition to the extra parenthesis. First, there is an additional record created for each of these, which increases the number of identifiers in the module. In Chez Scheme, a little bit of this is not a big deal, but a lot of it can lead to slower compilation times. Second, to figure out which one you have, you need to use either the matching in define-pass or nanopass-case to figure out which is which.
On the other hand, you can match the unary nonterminal productions as part of a pattern, where as, using a terminal will require using the guard syntax of the match to differentiate one symbol from another. Also, if the Racket version is using an integer indicator to decide on which is which, it is possible this will be slightly faster, depending on how the memory ref and comparison compares with symbol equality checking.
I think the error messages should be similar, in fact the terminal might be a little better. I did this test on the Scheme version, so I’m not sure how closely the Racket version matches on this:
> (import (nanopass))
> (define-language L-unary
(Types (t)
(a)
(b)
(c)
(d)))
> (define (types? x) (memq x '(a b c d)))
> (define-language L-term
(terminals
(types (t)))
(Types (ty)
t))
> (define-language L-unary
(Program (p)
(begin t ...))
(Types (t)
(a)
(b)
(c)
(d)))
> (define-language L-term
(terminals
(types (t)))
(Program (p)
(begin ty ...))
(Types (ty)
t))
> (define-language L-term2
(terminals
(types (t)))
(Program (p)
(begin t ...)))
> (with-output-language (L-unary Program) `(begin (a) (b) (c) (e)))
Exception in meta-parse-Types: unrecognized pattern or template (e)
> (with-output-language (L-term Program) `(begin a b c e))
Exception in with-output-language: expected Types but received e in field ty of (begin ty ...) from expression ((quote a) (quote b) (quote c) (quote e))
> (with-output-language (L-term2 Program) `(begin a b c e))
Exception in with-output-language: expected types but received e in field t of (begin t ...) from expression ((quote a) (quote b) (quote c) (quote e))
We’ve used a combination of these approaches in Chez Scheme and in the compilers we’ve built. We’ve used the unary approach for things like representing a true or false value as a predicative-expressions in the internals of Chez Scheme (not to be confused with the values #t and #f in value context). So we might have production like:
(Predicate (p)
(true)
(false)
(if p0 p1 p2)
—)
Then we can match things like:
(define-pass optimize-if : L (ir) -> L ()
(Predicate : Predicate (ir) -> Predicate ()
[(if (true) ,[p1] ,p2) p1]
[(if (false) ,p1 ,[p2]) p2]
—))
When we have a larger number of types we often use a terminal like the in the L-term and L-term2 languages above.
I think for me the rule of thumb has been around 1. whether it will be useful to pattern match it (like in the Predicate case), and 2. how many of these are there, and is matching them going to become a burden.
I think in general we probably come down on the terminal side more often than not, but there are good reasons to do the unary thing at times.
Sorry not to have better guidance on this, hopefully this is at least somewhat helpful.
-andy:)