Parse a file, whole pass validation, multiple return value and entry

Amirouche Boubekki

unread,

Jul 21, 2019, 7:51:13 PM7/21/19

to nanopass-framework

A) How to parse a file and retrieve the line, column information inside the pass?

B) Is it possible to do a whole pass validation checks and error at the end of the pass?

C) In parse-and-rename, what is the purpose of:

(Expr e initial-env))

D) What is the purpose of returning multiple values? In parse-and-rename

there is an example that takes the expression + and env. But I always see

signature like :

(Expr : * (e) -> Expr ()

What is the purpose of () in the above

E) what is the purpose of entry:

(entry Program)

re: https://github.com/akeep/scheme-to-c

Andy Keep

unread,

Jul 28, 2019, 2:41:15 PM7/28/19

to nanopass-framework

Answers inline below:

On Sunday, July 21, 2019 at 7:51:13 PM UTC-4, Amirouche Boubekki wrote:

A) How to parse a file and retrieve the line, column information inside the pass?

The nanopass framework doesn't provide a parser generator for reading from a file, it really only provides parsing from an input S-expression into the matching nanopass framework record representation. Unfortunately, this means you'll need to either write your own parser. Alternatively, some scheme systems, like Chez Scheme, expose a version of read that will provide some of this information. See the documentation for get-datum/annotations (http://cisco.github.io/ChezScheme/csug9.5/syntax.html#./syntax:s70) for details.

Once you have source information, you can add the source information as a field in the nanopass grammar. This is what we do in Chez Scheme.

B) Is it possible to do a whole pass validation checks and error at the end of the pass?

Not sure what you mean by "whole pass validation checks". The only "validation" the nanopass framework performs is on the construction of the output objects, where the checks are a wrapper around the record constructor. This can always be disabled by compiling at optimize level 3, but it does mean that you'd need to be very careful about matches that go deeper than a single level, since you can no longer guarantee that the constructed language forms are well formed.

C) In parse-and-rename, what is the purpose of:

(Expr e initial-env))

Expr processes the incoming s-expression into the equivalent nanopass language forms, where the initial environment (initial-env) contains the initial bindings for handling of language forms like quote and lambda, "macros" like and and or, and primitives. All of these could be re-bound in a local binding form, so we treat them just like any other binding.

D) What is the purpose of returning multiple values? In parse-and-rename
there is an example that takes the expression + and env. But I always see
signature like :

(Expr : * (e) -> Expr ()

What is the purpose of () in the above

The purpose of returning multiple values is that sometimes you want to return multiple values? I mean, sometimes you want the transformed expression and other information, like maybe the set of free variables. The signature:

(Expr : * (e) -> Expr ()

Means that the Expr transform does not expect an input language form (*), takes one argument ((e)) as input, and returns an output expression (Expr) and no additional values (()).

E) what is the purpose of entry:

(entry Program)

This tells the nanopass framework what form is entry point for the language, so that if the user does not explicitly specify it in define-pass, it can generate the code that will start transforming at Program, as opposed to say Expr, or some other nonterminal. If this is not specified in the define-language form, then the first nonterminal listed in a define-language form is used as the entry point, unless the language extends from another language, in which case, it will use the entry nonterminal from the base language.

For example:

(define-language L0

(terminals

---)

(Expr (e)

---)

(Program (p)

---))

Will have Expr as an entry point,

(define-language L1

(terminals

---)

(entry Program)

(Expr (e)

---)

(Program (p)

---))

Will have Program as an entry and

(define-language L2

(extends L1)

(Expr (e)

----))

Will have Program as its entry because L1 has Program as its entry.

These last two are documented in the nanopass framework documentation, which you can find in the doc directory of the project. Section 2.2.1 covers the define-language form (including a description of entry) and section 2.3.1 covers the define-pass form.

-andy:)

re: https://github.com/akeep/scheme-to-c

Jens Axel Søgaard

unread,

Jul 28, 2019, 3:45:22 PM7/28/19

to Amirouche Boubekki, nanopass-framework

Den søn. 28. jul. 2019 kl. 20.41 skrev Andy Keep <andy...@gmail.com>:

Answers inline below:

On Sunday, July 21, 2019 at 7:51:13 PM UTC-4, Amirouche Boubekki wrote:
A) How to parse a file and retrieve the line, column information inside the pass?

The nanopass framework doesn't provide a parser generator for reading from a file, it really only provides parsing from an input S-expression into the matching nanopass framework record representation. Unfortunately, this means you'll need to either write your own parser. Alternatively, some scheme systems, like Chez Scheme, expose a version of read that will provide some of this information. See the documentation for get-datum/annotations (http://cisco.github.io/ChezScheme/csug9.5/syntax.html#./syntax:s70) for details.

Once you have source information, you can add the source information as a field in the nanopass grammar. This is what we do in Chez Scheme.

Amirouche:

The first "pass" of Urlang is to parse an syntax object (with source location information) and produce the corresponding
program representing as Nanopass structures. This syntax-object could have been the result of read-syntax (the Racket
version of get-datum/annotations), but instead I opted for the interface to be a macro.

Here is a program that invokes the compiler:

#lang racket
	(require urlang)

	(urlang
	(urmodule demo-fact
	(export fact)
	(define (fact n) (if (= n 0) 1 (* n (fact (- n 1)))))
	(console.log (fact 5))))

The syntax-transformer associated with urlang receives then receives the syntax object
#'(urmodule demo-fact ...)
which has the source location information needed.

And as Andy writes, an extra field is used in grammar to keep the syntax objects around.

https://github.com/soegaard/urlang/blob/master/urlang/main.rkt#L65

/Jens Axel

Amirouche Boubekki

unread,

Jul 29, 2019, 10:56:57 AM7/29/19

to Jens Axel Søgaard, nanopass-framework

I read the Chez code now I understand.

Thanks for all the advices!

Reply all

Reply to author

Forward