Problem with empty production. Syntax error on supposedly correct input.

410 views
Skip to first unread message

Grzegorz Milka

unread,
Jul 31, 2012, 11:54:03 AM7/31/12
to ply-hack
Hello,

I decided to use ply to implement assembler from "Elements of computer
systems" and in the process I noticed an unexpected behaviour (error
on someone's part) on something that in my opinion looks ok.

Here's the program showing the error:

import ply.lex as lex
import ply.yacc as yacc

#####################
# LEXER
#####################

tokens = [
'SEMICOLON',
'EQUAL',
'JUMP',
'REGISTER',
]

t_ignore = ' \r'

t_SEMICOLON = r';'
t_EQUAL = r'='

def t_ID(t):
r'(?:D|JMP)'
if t.value == 'D':
t.type = 'REGISTER'
t.value = ('REGISTER', t.value)
elif t.value == 'JMP':
t.type = 'JUMP'
return t

def t_error(t):
print "Illegal character '%s'" % t.value[0]
t.lexer.skip(1)

#####################
# PARSER
#####################

def p_c_order(p):
'c_order : dest op jmp'
pass

def p_dest(p):
'dest : REGISTER EQUAL'
pass

def p_dest_empty(p):
'dest : empty'
pass

def p_op(p):
'op : REGISTER'
pass

def p_jmp(p):
'jmp : SEMICOLON JUMP'
pass

def p_empty(p):
'empty :'
pass

def p_error(p):
print "Syntax error in input! " + str(p)

lexer = lex.lex()
parser = yacc.yacc()

if __name__ == '__main__':
parser.parse('D;JMP')

It is a minimal example I could made of a lexer and parser of simple
assembly language.
It is supposed to parse orders like: D=A+D;JMP (register D is equal to
sum of A and D, JMP to the address pointed by A) or D;JEQ (Jump to the
address pointed by D if D == 0).

I stripped out everything I could. I only left the ability to
recognize register D, equal sign, semicolon and JMP instruction. So it
can parse ("D=D;JMP"), but on ("D;JMP") it returns an error

Here's the debug output:

PLY: PARSE DEBUG START

State : 0
Stack : . LexToken(REGISTER,('REGISTER', 'D'),1,0)
Action : Shift and goto state 3

State : 3
Stack : REGISTER . LexToken(SEMICOLON,';',1,1)
ERROR: Error : REGISTER . LexToken(SEMICOLON,';',1,1)
Syntax error in input! LexToken(SEMICOLON,';',1,1)

State : 3
Stack : REGISTER . error
ERROR: Error : REGISTER . error

State : 0
Stack : . error
ERROR: Error : . error

State : 0
Stack : . LexToken(JUMP,'JMP',1,2)
ERROR: Error : . LexToken(JUMP,'JMP',1,2)

State : 0
Stack : . $end
ERROR: Error : . $end

What's interesting if I add following rule:

def p_c_order_empty(p):
'c_order : op jmp'
pass

It works! However if I add this rule:

def p_c_order_empty(p):
'c_order : empty op jmp'
pass

Or change p_dest_empty(p) to:

def p_dest_empty(p):
'dest : '
pass


The error remains the same. Even though it seems (to me) to be the
same.

Can someone explain what's happening here? Is it an error on my part
or perhaps it seems to be a bug?

A.T.Hofkamp

unread,
Aug 1, 2012, 4:48:08 AM8/1/12
to ply-...@googlegroups.com
On 07/31/2012 05:54 PM, Grzegorz Milka wrote:
> Hello,
>
> I decided to use ply to implement assembler from "Elements of computer
> systems" and in the process I noticed an unexpected behaviour (error
> on someone's part) on something that in my opinion looks ok.

The example code has a shift/reduce error which means your grammr is too complicted for the LALR(1)
algroithm to handle. It does the best it can, but some paths in the grammar are dropped. When you
try to use those paths, the input is not recognized.

This is not a bug. It is a trade-off between simplicity and speed of the implementation versus
acceptable expression power in the grammar (the algorithm is around 20-25 years old).

(It may also be the case that the dropped paths never happens in actual input. In such cases the
generated parser is still usable. this is why you do get a parser even with a grammar that falls
outside the limits of the LALR(1) algorithm. You should however verify that the shift/reduce
conflict does not drop paths that can occur in actual input.)


You have to eliminate the shift/recude conflict first (ie make the grammar fit within the limits of
the LALR(1) algorithm) before you can be sure every path in the grammar is considered.


Albert
Reply all
Reply to author
Forward
0 new messages