Simple recursive examples

35 views

Skip to first unread message

Timothy Gall

unread,

Jul 16, 2024, 4:19:27 AM7/16/24

to nanopass-framework

I am trying to learn some very basic examples in Racket nanopass. I started with the non-recursive swap-first-second below which works fine, but when I try to turn (+ 1 2 3 4 5) into (+ 1 (+ 2 (+3 (+ 4 5)))) I can't get it to work recursively. I can turn (+ 1 2 3) into (+ 1 (+ 2 3)) easily.

In addition to help with this simple example, is there a simple tutorial on how ... works in nanopass? I don't understand why [(+ ,[e0] ,[e1] ...) `(+ ,[e0] ,[e1] ... ,[e0] ,[e1] ...)] works but [(+ ,[e0] ,[e1] ...) `(+ ,[e1] ... ,[e0] ,[e1] ... ,[e0])] doesn't.

Regards,

Tim

(define-language L0
(terminals
(number (c)))
(Expr (e body)
c
(+ e0 e1 ... )))

(define-language L1
(extends L0)
(Expr (e body)
(- (+ e0 e1 ... ))
(+ (+ e0 e1))))

(define-pass make+binary : L0 (ir) -> L1 ()
(Expr : Expr (ir) -> Expr ()
[(+ ,[e0] ,[e1] ,[e2] ...) `(+ ,e0 (+ ,e1 ,e2 ...))]))
;

(define-pass swap-first-second : L0 (ir) -> L1 ()
(Expr : Expr (ir) -> Expr ()
[(+ ,[e0] ,[e1] ) `(+ ,e1 ,e0)]))
; +: invalid pattern or template in: (+ (unquote e1) (unquote e2) ...)

(define-parser parse-L0 L0)

(make+binary (parse-L0 '(+ 1 2 3)))

(swap-first-second (parse-L0 '(+ 1 (+ (+ 2 3) 4))))

(define-pass make+binary-three : L0 (ir) -> L1 ()
(Expr : Expr (ir) -> Expr ()
[(+ ,[e0] ,[e1] ,[e2]) `(+ ,e0 (+ ,e1 ,e2))]))

(make+binary-three (parse-L0 '(+ 1 2 3)))
;=> (+ 1 + (2 3))

Andrew Wilcox

unread,

Jul 17, 2024, 7:26:37 AM7/17/24

to Timothy Gall, nanopass-framework

To start with a minor point, by convention meta variables representing a sequence of values are given a "*" suffix. This makes no difference to Nanopass, but is useful to keep track of which meta variables are singular and which are lists. Thus I'd write the languages as

(define-language L0

(terminals

(number (c)))

(Expr (e body)

(+ e0 e* ...)))

(define-parser parse-L0 L0)

(define-language L1

(extends L0)

(Expr (e body)

(- (+ e0 e* ...))

(+ (+ e0 e1))))

(define-parser parse-L1 L1)

Note that a sequence such as "e* ..." is allowed to match zero values. Thus in this definition of L0, (+ 123) is a legal expression, where e0 is 123 and e* is the empty list. If you wanted to specify that a + expression needed at least two arguments, you could use (+ e0 e1 e* ...)

Nanopass can recursively transform subexpressions for you, but it doesn't take the output of a pass and recursively apply it through the same transformer. It's not like macro expansion, for example, where the expression is repeatedly expanded until no more macros are left. You wrote `(+ ,e0 (+ ,e1 ,e2 ...)) for the output of your make+binary pass, but that's not going to work because (+ ,e1 ,e2 ...) isn't a legal L1 expression if e2 contains multiple values.

Using the cata-morphism syntax such as ",[e0]" in the pattern, like you did, means that the subexpressions will already be transformed to L1 form. This means that while we still need to get the make-binary transformation working, we don't need to do it recursively.

If Nanopass doesn't have a built in facility to do the transformation we want, we can do it ourselves. We can start with a to-binary function that operates on ordinary Racket lists, without any languages involved:

(define (to-binary lst)

(if (null? (cdr lst))

(car lst)

(let loop ((lst lst))

`(+ ,(car lst)

,(if (null? (cdr (cdr lst)))

(cadr lst)

(loop (cdr lst)))))))

(to-binary '(1 (+ 2 3) 4))

;=> (+ 1 (+ (+ 2 3) 4))

When I said that we wouldn't need to implement this transformation recursively, an example is (+ 2 3). It's already in L1 form, and we can assume that it will be in L1 form because using the cata-morphism syntax will ask Nanopass to recursively transform the subexpressions for us. Thus the elements of lst can be output unchanged, we just need to add the nested + expressions.

The cata-morphism syntax will give us L1 language elements, so let's change to-binary to take a list of such elements and to output an L1 language element. All we actually need to do is add an with-output-language form:

(define (to-L1-binary lst)

(with-output-language (L1 Expr)

(if (null? (cdr lst))

(car lst)

(let loop ((lst lst))

`(+ ,(car lst)

,(if (null? (cdr (cdr lst)))

(cadr lst)

(loop (cdr lst))))))))

(unparse-L1

(to-L1-binary

(list (parse-L1 1)

(parse-L1 '(+ 2 3))

(parse-L1 4))))

;=> (+ 1 (+ (+ 2 3) 4))

Terminals are actually represented by themselves, so (parse-L1 1) is simply 1, but I wrote it out for clarity.

OK!

(define-pass make-binary : L0 (ir) -> L1 ()

(Expr : Expr (ir) -> Expr ()

[(+ ,[e1] ,[e*] ...)

(to-L1-binary (cons e1 e*))]))

Here e1 will be a L1 language element, and e* will be an ordinary Racket list, where each element of the list is an L1 language element.

Now the transformation is recursively applied to subexpressions, which we didn't have before.

(unparse-L1 (make-binary (parse-L0 '(+ 1 (+ 2 3 4) 5))))

;=> (+ 1 (+ (+ 2 (+ 3 4)) 5))

At this point we could inline to-L1-binary, and then we wouldn't need the with-output-language form anymore because the quasiquote will default to the output language of the pass.

You asked about the sequence syntax, and mentioned that `(+ ,[e0] ,[e1] ... ,[e0] ,[e1] ...) worked as an output form. This surprised me, because in the quasiquote form the expression following the comma is evaluated as a Racket expression. Square brackets and parentheses are interchangeable in Racket, so I'd expect this to be trying to call e0 and e1 as functions.

You don't actually have a reason to use ... to output L1 language expressions because you don't have repeated values in L1. If you did, ",foo ..." in a Nanopass quasiquote evaluates foo as a Racket expression, expecting it to produce an ordinary Racket list of values, where each element of the list is an element of the output language. That then gets pasted into the output form. Because it expects an ordinary Racket list, we can use Racket list functions such as map and cons.

Say we were transforming an input language (+ e1 e* ...), where the expression has to have at least one argument, to an output language (+ e* ...), where it could have no arguments. We don't want to lose the e1 value, we just want it to now be included in the e* list.

`(+ ,(cons e1 e*) ...)

I hope this helps,

Andrew

Reply all

Reply to author

Forward

0 new messages