best way to encode nanopass nonterminals chosen from a list of symbols?

John Clements

unread,

May 11, 2019, 4:35:22 PM5/11/19

to nanopass-framework

I'd like to specify a nanopass nonterminal that is one of a set of symbols. For instance:

(define-language x861a
(terminals
(label (lbl))
;;... other terminals elided

)
(Register (reg) rax rbx rcx rdx)
;;... other nonterminals elided

)

It appears to me that nanopass doesn't allow this; that is, a nonterminal specified using a naked identifier must be a reference to another nonterminal or terminal[*]

This leaves me with two alternatives that I can see

1) encode it as a terminal with a predicate that checks that the symbol is in the specified set, or

2) wrap parens around each choice, to produce something like this:

(define-language x861a
(terminals
   (label (lbl))
   ;; other terminals elided...
   )
(Register (reg) (rax) (rbx) (rcx) (rdx))
;; other nonterminals elided ...
)

Neither solution is terrible; it appears to me that the second one will give me slightly better error-checking, since nanopass can "see" the names of the registers and catch more errors, but the penalty is more parens.

Are there other good reasons for choosing one over the other, or is there a better solution that I'm missing entirely?

Thanks!

John Clements

[*] actually, I think the error message here is misleading; it says

"define-language: no nonterminal for meta-variable in: rax", but I believe that terminals are also legal, and this error message should be something like

"define-language: no nonterminal or terminal for meta-variable in: rax". If I'm right about this, I'd be happy to make this a pull request.

Andy Keep

unread,

May 16, 2019, 9:44:30 PM5/16/19

to John Clements, nanopass-framework

Hey John,

Sorry for the slow response, I’m afraid you emailed me as I was returning from a visit to my brother’s family in Taiwan, and it has taken me a bit of time to recover enough from the 12-hour time change that I’ve still got a little energy to answer emails when I get home :)

You are correct that the nanopass-framework (both Scheme and Racket) does not support bare symbols. On the one hand, bare symbols are also used for terminal and non-terminal references, so supporting them produces the potential for collision, though I think it is probably not a big deal to support them, with the possible hazard of programmers accidentally selecting the name of a terminal or nonterminal and one of the symbols they want to use to be the same. (This danger already exists in the leading keyword term, which could be either a keyword or a terminal or nonterminal reference, and I have occasionally shot myself in the foot with this, and it usually isn’t too bad.)

That said, it would be some work to support it, and I’d want to also support things that do not have a keyword prefix, which is also not supported (for instance you cannot have “bindings” production that looks like: ([var expr] …), because the nanopass framework looks for a leading symbol (keyword, terminal reference, or nonterminal reference) to figure out what the name of the produced record/struct is. Anyway, not impossible to support, but a bit of work.

To your question:

There are a couple of downsides to the unary production approach in addition to the extra parenthesis. First, there is an additional record created for each of these, which increases the number of identifiers in the module. In Chez Scheme, a little bit of this is not a big deal, but a lot of it can lead to slower compilation times. Second, to figure out which one you have, you need to use either the matching in define-pass or nanopass-case to figure out which is which.

On the other hand, you can match the unary nonterminal productions as part of a pattern, where as, using a terminal will require using the guard syntax of the match to differentiate one symbol from another. Also, if the Racket version is using an integer indicator to decide on which is which, it is possible this will be slightly faster, depending on how the memory ref and comparison compares with symbol equality checking.

I think the error messages should be similar, in fact the terminal might be a little better. I did this test on the Scheme version, so I’m not sure how closely the Racket version matches on this:

> (import (nanopass))

> (define-language L-unary

(Types (t)

(a)

(b)

(c)

(d)))

> (define (types? x) (memq x '(a b c d)))

> (define-language L-term

(terminals

(types (t)))

(Types (ty)

t))

> (define-language L-unary

(Program (p)

(begin t ...))

(Types (t)

(a)

(b)

(c)

(d)))

> (define-language L-term

(terminals

(types (t)))

(Program (p)

(begin ty ...))

(Types (ty)

t))

> (define-language L-term2

(terminals

(types (t)))

(Program (p)

(begin t ...)))

> (with-output-language (L-unary Program) `(begin (a) (b) (c) (e)))

Exception in meta-parse-Types: unrecognized pattern or template (e)

> (with-output-language (L-term Program) `(begin a b c e))

Exception in with-output-language: expected Types but received e in field ty of (begin ty ...) from expression ((quote a) (quote b) (quote c) (quote e))

> (with-output-language (L-term2 Program) `(begin a b c e))

Exception in with-output-language: expected types but received e in field t of (begin t ...) from expression ((quote a) (quote b) (quote c) (quote e))

We’ve used a combination of these approaches in Chez Scheme and in the compilers we’ve built. We’ve used the unary approach for things like representing a true or false value as a predicative-expressions in the internals of Chez Scheme (not to be confused with the values #t and #f in value context). So we might have production like:

(Predicate (p)

(true)

(false)

(if p0 p1 p2)

—)

Then we can match things like:

(define-pass optimize-if : L (ir) -> L ()

(Predicate : Predicate (ir) -> Predicate ()

[(if (true) ,[p1] ,p2) p1]

[(if (false) ,p1 ,[p2]) p2]

—))

When we have a larger number of types we often use a terminal like the in the L-term and L-term2 languages above.

I think for me the rule of thumb has been around 1. whether it will be useful to pattern match it (like in the Predicate case), and 2. how many of these are there, and is matching them going to become a burden.

I think in general we probably come down on the terminal side more often than not, but there are good reasons to do the unary thing at times.

Sorry not to have better guidance on this, hopefully this is at least somewhat helpful.

-andy:)

--
You received this message because you are subscribed to the Google Groups "nanopass-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nanopass-framew...@googlegroups.com.
To post to this group, send email to nanopass-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nanopass-framework/fbe65473-3dc6-4ebc-84fa-16578b39d669%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Clements

unread,

May 16, 2019, 11:32:11 PM5/16/19

to nanopass-framework

Thanks!

Reply all

Reply to author

Forward