Greedy rule is consuming a keyword

72 views
Skip to first unread message

jan...@spotwish.com.br

unread,
Aug 11, 2017, 10:25:21 PM8/11/17
to PEG.js: Parser Generator for JavaScript
Hello,

I'm trying to parse a line like this:

ONMOGOSUB10

(you can read it as: ON MO GOSUB 10)

MO, in this case, is a variable name, while ON and GOSUB are keywords.

However, my grammar is reading MOGOSUB as if it was a variable name.

My grammar is like this (simplified):

start
= "ON"i ws expression ws "GOSUB"i ws number

number
= [0-9]+

expression
= [a-zA-Z]+

keyword
= "GOSUB"i

ws
= " "*

How can I make the "expression" rule not greedy in relation to keyword?

Thanks in advance,
Rafael.

Michaeljohn Clement

unread,
Aug 12, 2017, 2:20:02 AM8/12/17
to jan...@spotwish.com.br, PEG.js: Parser Generator for JavaScript
It looks like you have an ambiguous grammar.

In addition to being hard to parse, it’s going to be hard for humans to read, for the same reason that "garden path" sentences are hard to read. We don’t like backtracking.

As for how to solve the problem, PEG does not backtrack through repetition operators. The only way to structure alternative parses is through alternative rules. The best you could do would be to make it explicit with something like:

S = “ON” ( TOK1 G / TOK2 G / TOK3 G / …)

TOK1 = letter
TOK2 = letter letter
TOK3 = TOK2 letter


G = “GOSUB” digit+

Or you could escape from the PEG formalism using JS.

The best way would be to redesign the grammar to not be ambiguous.

Regards,
Michaeljohn

jan...@spotwish.com.br

unread,
Aug 12, 2017, 8:19:24 PM8/12/17
to PEG.js: Parser Generator for JavaScript
On Saturday, August 12, 2017 at 3:20:02 AM UTC-3, inimino wrote:
> It looks like you have an ambiguous grammar.

Yes. Unfortunately, it is a hard requirement. This grammar comes from a BASIC language from the 80s. My goal is to write a transpiler for that language.

> As for how to solve the problem, PEG does not backtrack through repetition operators. The only way to structure alternative parses is through alternative rules. The best you could do would be to make it explicit with something like:
>
> S = “ON” ( TOK1 G / TOK2 G / TOK3 G / …)
>
> TOK1 = letter
> TOK2 = letter letter
> TOK3 = TOK2 letter
> …
>
> G = “GOSUB” digit+

There is a detail I left out. The language also supports expressions after the ON. Example:

ON 1 GOSUB 10
ON X+1 GOSUB 10
ON (X*2+Y) GOSUB 10,20,30,40

Of course the user can type everything without spaces (ugh)

In this case, I'm not sure the proposed solution would work.

> Or you could escape from the PEG formalism using JS.

That seems like the only way.
I'm not sure how to do that, based on the documentation...
Maybe I should build a lexer manually and then feed the token stream to PEG.js.
Or maybe there is another approach that I'm not aware of.
Could you give me some pointers please?

Thanks,
Rafael

Samuel Crow

unread,
Aug 12, 2017, 9:00:44 PM8/12/17
to jan...@spotwish.com.br, PEG.js: Parser Generator for JavaScript
A lexer is a hard requirement for those 8 bit BASICs. The C64 used to store tokenized text in memory and to disk to save space. I think the Apple 2 did likewise.

jas...@snmpstack.com

unread,
Aug 14, 2017, 1:59:14 AM8/14/17
to PEG.js: Parser Generator for JavaScript, jan...@spotwish.com.br
Try this:

start
= "ON"i ws expression ws "GOSUB"i ws number

number

= $[0-9]+

expression
= $(!keyword [a-zA-Z])+

keyword
= "GOSUB"i

ws
= " "*

Reply all
Reply to author
Forward
0 new messages