limitations on grammars with similar rules?

18 views
Skip to first unread message

John Clements

unread,
May 28, 2019, 2:22:00 AM5/28/19
to nanopass-framework
I ran across a funny error today, and traced it back to what seems to be a problem with similar-looking grammar rules. Consider the following code:

#lang racket

;; this file handles the "r" level of the compiler.

(require nanopass/base)

(define-language R2a
  (terminals
    (variable (x))
    (constant (c)))
  (Expr (e body)
    (abc se)
    (abc arith))
  (Arith (arith)
    (read))
  (SimpleExpr (se)
    x
    c))

(define constant? fixnum?)
(define variable? symbol?)

(define-pass zzz : R2a (e) -> R2a ()
  [Expr : Expr (e) -> Expr ()
        [else
         (define a (in-context Arith `(read)))
         `(abc ,a)]])

(define-parser parse-R2a R2a)
;; this fails at parse time:
(parse-R2a '(abc (read)))
;; ... and this fails in the transformation:
(zzz (parse-R2a '(abc 1234)))

It appears to me that when nanopass sees two rules that are "similar" (for some definition of similar that I'm not sure of), it assumes in parsing and transformation that only one (the first?) is legal, but that the errors are only detected at runtime.

I think I see that trying to do otherwise could lead you down the rabbit hole toward LR parsing... perhaps it would be not too hard to detect this statically?

Or perhaps this is a racket-only problem, or perhaps I'm failing to see some easy solution!

Many thanks,

John Clements

Andy Keep

unread,
Jun 4, 2019, 11:17:57 PM6/4/19
to John Clements, nanopass-framework
Yes, this is true.

It is a limitation I've thought a lot about how to fix, but have not actually implemented anything for.

The basic problem is that in both the generated parser and the generated meta-parser, which is used for matching patterns and output templates, it looks for the 1. a matching shape and 2. a matching initial keyword (if there is one).

In your example (abc se) and (abc arith) both fall into the same match rules.

Fortunately, the work around for this is pretty straightforward, if potentially a bit tortured.  The conflicting productions can be turned into a single production, with a nonterminal that specifies the differences.  From your example, you could do this as:

(define-language R2s
  (terminals
    (variable (x))
    (constant (c)))
  (Expr (e body)
    (abc abc-body))
  (AbcBody (abc-body)
    se
    arith)
  (Arith (arith)
    (read))
  (SimpleExpr (se)
    x
    c))

It is still also possible to match them separately, since (abc ,se) will only match abc forms with an se body and (abc ,arith) will only match those with an arith body.  (Of course this is under the assumption that se and arith are always distinct.)

A more tortured example is a language from the very first assignment of the compiler course I TAed at IU.  The natural definition of it would be:

(define-language L1
  (terminals
    (variable (x))
    (binary-operator (binop))
    (integer-64 (int64))
    (integer-32 (int32)))
  (Program (p)
    (begin stmt* ...))
  (Statement (stmt)
    (set! x0 int64)
    (set! x0 x1)
    (set! x0 (binop x1 int32))
    (set! x0 (binop x1 x2))))

However all of the set! productions conflict, since they all have the same top-level shape (list of three items, where set! is the first item).  The somewhat more tortured version of this is:

(define-language L1
  (terminals
    (variable (x))
    (binary-operator (b))
    (int64 (w))
    (int32 (hw)))
  (Program (p)
    (begin s* ...))
  (Statement (s)
    (set! x rhs))
  (Rhs (rhs)
    w
    x
    (b x rand))
  (Operand (rand)
    hw
    x))

(Note that int64 and int32 are not legal metavariables, because the nanopass framework is not very smart about how it identifies the base of the metavariable, and it simply strips the trailing digits, resulting in them both being int---another bug on my list to fix.)

-andy:)
--
You received this message because you are subscribed to the Google Groups "nanopass-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nanopass-framew...@googlegroups.com.
To post to this group, send email to nanopass-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nanopass-framework/3ae4524e-735e-44e4-afe1-a46c471ca164%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages