Syntax-parse says literal data is not allowed - implementation of a register bank

173 views
Skip to first unread message

Paulo Matos

unread,
Jun 11, 2018, 4:57:29 AM6/11/18
to racket...@googlegroups.com
Hi all,

So, I have decided to give syntax-parse a go and implement a new,
lighter register bank implementation for S10.

As far as I can tell, the syntax-parse was actually quite easy to come
to terms with but the error I am getting pointed nowhere and it's not
really easy to understand.

I start by defining some generic methods:


(provide gen:registerbank)

(define-generics registerbank
(register-bank-length registerbank) ;; Number of registers in the bank
(register-width registerbank) ;; Number of bits per register
(register-set! registerbank idx val) ;; Sets register number idx to val
(register-get registerbank idx) ;; Gets the value of register
number idx
(register-names registerbank idx) ;; Returns list of names for
register number idx
(register-idx registerbank name) ;; Returns the idx
corresponding to name
)


----

and then, I create a macro that generates registerbanks and implement
the generic methods.


(require s10/registerbank
racket/syntax
(for-syntax anaphoric
mischief/for
racket/base
racket/syntax
syntax/parse))

;;
---------------------------------------------------------------------------------------------------
;; Simple implementation of a register bank which implements the
registerbank generics.

(begin-for-syntax
(define-syntax-class field-spec
#:attributes (idx name-list cst)
#:datum-literals ()
(pattern
(idx:nat #:names name-list:expr #:static cst:nat))
(pattern
(idx:nat #:names name-list:expr)
#:with cst #'#false)))


(define-values (prop:name->idx name->idx? name->idx-ref)
(make-struct-type-property 'name->idx))

(define-values (prop:idx->names idx->names? idx->names-ref)
(make-struct-type-property 'idx->names))

(define-values (prop:register-width register-width? register-width-ref)
(make-struct-type-property 'register-width))

(define-values (prop:cst-idxs cst-idxs? cst-idxs-ref)
(make-struct-type-property 'cst-idxs))

(define-syntax (make-regbank stx)
(syntax-parse stx
[(_ name:id width:nat [field+:field-spec ...])
#:with name->idx-hash
(make-immutable-hasheq
(for/append ([field-idx (in-list (syntax->datum #'(field+.idx ...)))]
[field-names (in-list (syntax->datum
#'(field+.name-list ...)))])
(map (lambda (fname) (cons field-idx fname)) field-names)))
#:with idx->names-hash
(make-immutable-hasheq
(for/append ([field-idx (in-list (syntax->datum #'(field+.idx ...)))]
[field-names (in-list (syntax->datum
#'(field+.name-list ...)))])
(for/list ([fname (in-list field-names)])
(cons fname field-idx))))
#:with nreg
(length (syntax->datum #'(field+ ...)))
#:with cst-idxs
(for/vector ([f (in-list (sort (map cons
(syntax->datum #'(field+.idx ...))
(syntax->datum #'(field+.cst ...)))
<
#:key car))])
(aif (cdr f) it #false))
#:with name-regvalues
(format-id #'name "~a-regvalues" #'name)
#:with constructor-name
(format-id #'name "make-~a" #'name)
#:with constructor-args
(build-list (syntax->datum #'nreg) (lambda (n) (format-id #'name
"x~a" n)))

#`(begin
(struct name
(regvalues)
#:property prop:name->idx name->idx-hash
#:property prop:idx->names idx->names-hash
#:property prop:register-width width
#:property prop:cst-idxs cst-idxs
#:methods gen:registerbank
[(define (register-bank-length rb) nreg)
(define (register-width rb) (register-width-ref rb))
(define (register-set! rb idx val)
(unless (cst-idxs-ref rb)
(vector-set! (name-regvalues rb) idx val)))
(define (register-get rb idx)
(aif (cst-idxs-ref rb) it (vector-ref (name-regvalues rb)
idx)))
(define (register-names rb idx)
(hash-ref (idx->names-ref rb) idx #false))
(define (register-idx rb name)
(hash-ref (name->idx-ref rb) name))])
(define constructor-name
(lambda constructor-args
(riscv (apply vector constructor-args)))))]))


I should be able to do something like this

(make-regbank simple-rb 32
[(0 #:names (zero x0) #:static 0)
(1 #:names (x1))])

However, I am getting:
?: literal data is not allowed;
no #%datum syntax transformer is bound
2


This points nowhere in the source code even though debugging is enabled.
The only 2 I can see in the expansion is from (define
(register-bank-length rb) nreg) but I don't understand why a 2 here
would be relevant.

Any ideas?

Also, I am happy to have comments/suggestions/bashings for my
implementation of this macro using syntax-parse. Things I didn't manage
to do yet:
* specify that name-list is a list of ids instead of just an expr.
* ensure that all indexes up to the maximum register index have an
entry? i.e. this is not possible
(make-regbank simple-rb 32
[(0 #:names (zero x0) #:static 0)
(3 #:names (x1))])
* ensure that there are no repeating indexes.
* ensure that there are no repeating names.


You will also notice that I use a vector of X values for a register bank
for X register even if there are constant registers. The issue here is
that I want very fast getting and setting and if I need to redirect this
through a hash-table, it will take a while. There are usually not that
many constant registers in a register bank, therefore this wastes some
space but is faster that adding an indirection.

Kind regards,

--
Paulo Matos

Paulo Matos

unread,
Jun 11, 2018, 9:44:11 AM6/11/18
to racket...@googlegroups.com


On 11/06/18 10:57, 'Paulo Matos' via Racket Users wrote:
> (riscv (apply vector constructor-args)))))]))
^^^^^

Still didn't understand the problem with the 'literal data is not
allowed;', however this is supposed to be 'name', not 'riscv'.

--
Paulo Matos

Matthias Felleisen

unread,
Jun 11, 2018, 10:49:51 AM6/11/18
to Paulo Matos, racket...@googlegroups.com

Your #:with clauses need to generate code to splice it into the generated

#lang racket

(require (for-syntax syntax/parse))

(define-syntax (foo stx)
(syntax-parse stx
[(_ x)
#:with y #’5 ;; <<—— Your code is missing the #’ here
#'(list x y)]))

(foo 10)

BUT yes, the error message is HORRIBLE.
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Alexis King

unread,
Jun 11, 2018, 11:18:21 AM6/11/18
to Paulo Matos, racket...@googlegroups.com
You can see a simpler version of this behavior with this tiny macro:

(define-syntax m
(syntax-parser
[(_)
#:with x 42
#'x]))

(m)

This produces the same “literal data is not allowed” error message. Why?

In Racket, literal data inside a syntax object — that is, anything that
isn’t a symbol or a pair — is not legal anywhere in a fully-expanded
program. These things are only legal in a fully-expanded program when
wrapped in quote or quote-syntax, so this is a valid fully-expanded
program:

(define x (quote 42))

...but this is not:

(define x 42)

Of course, when writing code in #lang racket, we enjoy so-called
“self-quoting” literals, so booleans, numbers, characters, strings,
bytes, regexps, vectors, hashes, and prefab structures are all legal
anywhere in expression position. How does this work if I just said
literal data is illegal when not explicitly wrapped in quote? Well, to
support self-quoting literals, the macroexpander implicitly wraps any
literal data it finds with #%datum, as described here:

http://docs.racket-lang.org/reference/quote.html#%28form._%28%28quote._~23~25kernel%29._~23~25datum%29%29

http://docs.racket-lang.org/reference/syntax-model.html#%28part._expand-steps%29

In #lang racket, #%datum just expands directly to quote, but in some
other languages, that might not be the case. My own #lang hackett, for
example, does not support all of the literals #lang racket does, so it
provides a custom version of #%datum that raises a syntax error when the
user tries to write things like prefab structure literals in Hackett
code. But how does the macroexpander figure out which #%datum to use?
Well, it uses the lexical context of the piece of literal data itself.

This is a problem, since in your code (and in my example program above),
you are binding a piece of literal data to a pattern variable using
#:with, which implicitly promotes the data to a syntax object. The
binding clause in my example program is equivalent to doing this:

#:with x (datum->syntax #f 42)

This means that the syntax object bound to x has (1) no lexical context
and (2) no source location information. The former means that the use
of 42 is an error ( sincethe expander doesn’t know which #%datum to use)
and the latter makes reporting an error somewhat difficult (since the
expander has no idea where the 42 came from). There are three
straightforward ways to fix this.

1. Use syntax or quote-syntax to explicitly create a syntax object.

This is the easiest thing to do if your program is really as
trivial as the example program at the beginning of my email. Just
change the binding clause to this:

#:with #'42

...and now the 42 will have the surrounding lexical context, the
correct #%datum will be bound, and everything will work.

Of course, in practice, you are often producing the number via
some arithmetic expression or other function call, so you can’t
just use syntax or quote-syntax. In that case, the following two
approaches apply.

2. Use datum->syntax to provide the correct lexical context.

This binding clause allows using an arbitrary number while still
providing enough lexical context for the expander to pick the
right #%datum:

#:with (datum->syntax #'here 42)

If it makes sense, you can even provide a third argument to
datum->syntax to provide source location information, but that
likely isn’t necessary.

3. Explicitly wrap the use in quote.

It’s possible to solve this problem without fiddling with x’s
lexical context at all, since you can just ensure x is only used
under quote in the expansion:

#:with x 42
#'(quote x)

This works fine, since literal data is allowed under quote,
regardless of its lexical information.

Using either of the second two solutions should help you solve your
problem.

Alexis

P.S. As a final note, this behavior reflects a minor difference between
syntax-parse and its #:with clauses versus syntax-case and the
with-syntax form. If you use with-syntax instead of #:with, you’ll
notice there is no error. This is because syntax-case and with-syntax
both implicitly use the lexical context of the expression that produces
a value to be bound when implicitly converting data to syntax objects,
whereas syntax-parse creates a syntax object with no lexical context at
all.

The syntax-case behavior makes some of these problems go away, but it’s
also a little more magical (in a negative way), and it has the potential
to hide some bugs, so I think the choice syntax-parse makes is the right
one.

P.P.S. As a really final note, uses of unsyntax inside quasisyntax also
has the same implicit behavior as syntax-case and with-syntax, so #`#,42
will actually produce a syntax object with lexical context, not
(datum->syntax #f 42).

Paulo Matos

unread,
Jun 11, 2018, 11:19:47 AM6/11/18
to Matthias Felleisen, racket...@googlegroups.com


On 11/06/18 16:50, Matthias Felleisen wrote:
>
> Your #:with clauses need to generate code to splice it into the generated
>
> #lang racket
>
> (require (for-syntax syntax/parse))
>
> (define-syntax (foo stx)
> (syntax-parse stx
> [(_ x)
> #:with y #’5 ;; <<—— Your code is missing the #’ here
> #'(list x y)]))
>
> (foo 10)
>
> BUT yes, the error message is HORRIBLE.
>
>


Thanks for the comments. I understand your fix seems to do the right
thing but the reason I don't have that if because if instead I do:
#lang racket

(require (for-syntax syntax/parse))

(define-syntax (foo stx)
(syntax-parse stx
[(_ x)
#:with y #'(+ 4 1)
#'(list x y)]))

(foo 10)

This will expand to (list 10 (+ 4 1)). Which means at run time I will be
doing the addition and list creation. I want to do the addition (in this
case) at expansion time, therefore I purposefully didn't add the #'.

The more operations I move to expansion time in my software, the better.

--
Paulo Matos

Matthias Felleisen

unread,
Jun 11, 2018, 11:29:20 AM6/11/18
to Paulo Matos, racket...@googlegroups.com
You want to compute the literal list at compile time and then convert it into a piece of syntax before you splice it in: 


#lang racket

(require (for-syntax syntax/parse))

(define-syntax (foo stx)
  (syntax-parse stx
    [(_ x:integer)
     #:with y (datum->syntax stx (make-some-list (syntax-e #'x) stx))

     #'(list x y)]))

;; Integer Syntax -> [Listof Integer]
(define-for-syntax (make-some-list n stx)
  (if (<= n 0)
      (raise-syntax-error #f "non-negative int expected" stx)
      (cons 'list (build-list n values))))

(foo 10)
(foo -1)


Ryan Culpepper

unread,
Jun 11, 2018, 5:14:27 PM6/11/18
to Alexis King, Paulo Matos, racket...@googlegroups.com
Alexis explained the mechanics well. Let me add some advice about macro
design.

When writing macros, it is important to keep track of not only what
kinds of data you're manipulating (eg, symbol? vs identifier?; syntax?
vs (listof syntax?); ...) but also the interpretation of the data.
Common interpretations include Expression, Definition, Binder, and
Datum. In some cases you might need to track more information, like the
run-time type of an expression; so within a macro you might have "types"
like Syntax[Expression[Integer]], for example.

My advice: Never treat a Datum as if it were an Expression.

It sometimes works... IF the Datum is one of the values that triggers
Racket's implicit #%datum treatment AND the syntax object's lexical
context has a #%datum macro that interprets the Datum as self-quoting.
But it's fragile (witness the error message you got) and it obscures
design knowledge. And if someone later manages to sneak in a Datum like
(list 'exit), odd things happen.

The proper way to convert a Datum into an Expression is `quote` [*].
(Alexis's recommendation #3.)

Along the same lines, the proper way to convert a Datum that is a syntax
object into an Expression that produces the same *syntax object* at run
time is `quote-syntax`. Not `syntax`; `syntax` takes a Syntax-Template,
not a Datum.

By the way, this should remind you of the explanation of every injection
attack ever: SQL injection, HTML/markup injection, etc. You have a
string, and you combine it with another string; what could go wrong? The
problem is the mapping from String[Arbitrary-Text] to
String[SQL-Literal-Scalar-Expression] is not just (format "'~a'" _).
Likewise, Syntax[Datum] to Syntax[Expression] is not the identity function.

Ryan


[*] Actually, even `quote` just broadens the subset of Datum that gets
converted faithfully. It doesn't handle non-prefab structures, data with
embedded syntax objects, etc. I know of a few ad hoc partial solutions
(IIRC, Typed Racket uses `print-convert`, for example). Maybe someone
should write a library that tackles the general problem.

Paulo Matos

unread,
Jun 12, 2018, 3:37:14 AM6/12/18
to Alexis King, racket...@googlegroups.com
I assume you missed here a variable name?
#:with x #'42

> ...and now the 42 will have the surrounding lexical context, the
> correct #%datum will be bound, and everything will work.
>
> Of course, in practice, you are often producing the number via
> some arithmetic expression or other function call, so you can’t
> just use syntax or quote-syntax. In that case, the following two
> approaches apply.
>
> 2. Use datum->syntax to provide the correct lexical context.
>
> This binding clause allows using an arbitrary number while still
> providing enough lexical context for the expander to pick the
> right #%datum:
>
> #:with (datum->syntax #'here 42)
>
> If it makes sense, you can even provide a third argument to
> datum->syntax to provide source location information, but that
> likely isn’t necessary.
>


Thanks for the explanation. I think I got it all down to here. Where is
the #'here coming from? Can't find anything in the manual except for
here strings.

> 3. Explicitly wrap the use in quote.
>
> It’s possible to solve this problem without fiddling with x’s
> lexical context at all, since you can just ensure x is only used
> under quote in the expansion:
>
> #:with x 42
> #'(quote x)
>
> This works fine, since literal data is allowed under quote,
> regardless of its lexical information.
>
> Using either of the second two solutions should help you solve your
> problem.
>
> Alexis
>
> P.S. As a final note, this behavior reflects a minor difference between
> syntax-parse and its #:with clauses versus syntax-case and the
> with-syntax form. If you use with-syntax instead of #:with, you’ll
> notice there is no error. This is because syntax-case and with-syntax
> both implicitly use the lexical context of the expression that produces
> a value to be bound when implicitly converting data to syntax objects,
> whereas syntax-parse creates a syntax object with no lexical context at
> all.
>

That makes sense, I have written smaller macros in syntax-case and
hadn't seen this issue before using with-syntax.

> The syntax-case behavior makes some of these problems go away, but it’s
> also a little more magical (in a negative way), and it has the potential
> to hide some bugs, so I think the choice syntax-parse makes is the right
> one.
>
> P.P.S. As a really final note, uses of unsyntax inside quasisyntax also
> has the same implicit behavior as syntax-case and with-syntax, so #`#,42
> will actually produce a syntax object with lexical context, not
> (datum->syntax #f 42).
>

Thanks once again for writing this thorough explanation.
Reply all
Reply to author
Forward
0 new messages