[LEPL] "No hash" error in offside-rule left recursive grammar

5 views
Skip to first unread message

Daniel Ermak

unread,
Apr 18, 2010, 11:47:48 AM4/18/10
to lepl
Hello,

I'm using lepl 4.0 on Python 3.1.2 on Linux. The following grammar
works as expected:

CLine = ContinuedBLineFactory(Token(r'\\'))

expr0 = Token("[A-Za-z_][A-Za-z0-9_]*")

expr1 = Delayed()

call = expr1 & expr0 > List # Deliberately not expr0 & expr1
expr1 += call | expr0

program = expr1 & Eos()
parsed = program.parse("a b c")
print(parsed[0])

This prints:

List
+- List
| +- 'a'
| `- 'b'
`- 'c'

But when I replace the program= line with:

program = (CLine(expr1) & Eos())
program.config.default_line_aware(block_policy=rightmost)

I get this error:

Traceback (most recent call last):
File "/home/danarmak/workspace/ScalySynth/src/synth/test.py", line
27, in <module>
parsed = program.parse("a b c")
File "/usr/lib/python3.1/site-packages/LEPL-4.0-py3.1.egg/lepl/core/
config.py", line 914, in parse
return self.get_parse()(stream, **kargs)
File "/usr/lib/python3.1/site-packages/LEPL-4.0-py3.1.egg/lepl/core/
parser.py", line 246, in single
return next(raw(arg, **kargs))[0]
File "/usr/lib/python3.1/site-packages/LEPL-4.0-py3.1.egg/lepl/core/
parser.py", line 136, in trampoline
value = next(value.generator)
File "/usr/lib/python3.1/site-packages/LEPL-4.0-py3.1.egg/lepl/
stream/filters.py", line 390, in _match
generator = self.matcher._match(transform.stream)
File "/usr/lib/python3.1/site-packages/LEPL-4.0-py3.1.egg/lepl/
matchers/memo.py", line 221, in _match
if key not in self.__caches:
File "/usr/lib/python3.1/site-packages/LEPL-4.0-py3.1.egg/lepl/
stream/stream.py", line 380, in __hash__
return hash(self.__line) ^ self.__offset
File "/usr/lib/python3.1/site-packages/LEPL-4.0-py3.1.egg/lepl/
stream/stream.py", line 561, in __hash__
return self.source.hash_line(self)
File "/usr/lib/python3.1/site-packages/LEPL-4.0-py3.1.egg/lepl/
stream/stream.py", line 710, in hash_line
self.__class__.__name__))
Exception: No hash for [(['Tk9'], 'a')], <bound method Line.location
of [(['Tk9'], 'a')]> (CachingTransformedSource)

The grammar is deliberately left-recursive: I want it to describe left-
associative function calls. If I make it right-recursive (call = expr0
& expr1), then it works with the offside rule, but that's not what I
want. Can I achieve what I want and what does this error mean? (What
does memoization and eliminating left-recursive loops have to do with
making a grammar whitespace-aware?)

Thanks!

Daniel Ermak

--
You received this message because you are subscribed to the Google Groups "lepl" group.
To post to this group, send email to le...@googlegroups.com.
To unsubscribe from this group, send email to lepl+uns...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lepl?hl=en.

andrew cooke

unread,
Apr 18, 2010, 12:03:18 PM4/18/10
to lepl

Ah :o(

OK, to answer your last question first, getting left recursion to work
correctly requires hashing (so that we can detect when we process the
same data again and again), while white space parsing is implemented
by modifying the token stream to include flags indicating the start
and end of a line. Unfortunately, it appears, on first glance, that
the modification process has a bug that breaks hashing.

So, this is a bug I need to fix.

Separate from that, do you need a left recursive grammar to describe
left-associative function calls? I am pretty sure that the answer has
to be "no", and you really should think about that, because there's a
big penalty (both efficiency and underlying code complexity - hence
this bug) in using left-recursive grammars.

Anyway, I'll look at fixing this ASAP - sorry for the bug and thanks
for the report,
Andrew

Daniel Armak

unread,
Apr 18, 2010, 12:27:30 PM4/18/10
to le...@googlegroups.com
You're right - after thinking about it some more, left recursion isn't necessary. I can define a rule for several calls in a row and then replace it with left-associative Call tokens in the AST. IOW, the grammar wouldn't specify the associativity at all. 

Here's the updated code, and it doesn't trigger the bug, either:

CLine = ContinuedBLineFactory(Token(r'\\'))

expr0 = Token("[A-Za-z_][A-Za-z0-9_]*")

expr1 = Delayed()

def calls_to_call(tokens):
    if len(tokens) == 1:
        return tokens[0]
    elif len(tokens) == 2:
        return Call(tokens)
    else:
        return Call((Call((tokens[0], tokens[1])), calls_to_call(tokens[2:])))

calls = expr0[2:] > calls_to_call
expr1 += calls | expr0 

program = (CLine(expr1) & Eos())
program.config.default_line_aware(block_policy=rightmost)

parsed = program.parse("a b c d")
print(parsed[0])

Thanks again - I've tried using several python parsing libraries and lepl is by far the most powerful!

Daniel Ermak

andrew cooke

unread,
Apr 18, 2010, 12:33:21 PM4/18/10
to le...@googlegroups.com

Great! I'm glad you've got a work-around. I'm trying to fix the bug at the
moment, but it's taking me a while to remember what it all does....

Thanks,
Andrew

andrew cooke

unread,
Apr 18, 2010, 1:21:40 PM4/18/10
to lepl


OK, this is the necessary diff to fix this particular bug. I'll release a
4.0.1 later today, probably.

diff -r 7bcfdf055100 src/lepl/stream/filters.py
--- a/src/lepl/stream/filters.py Fri Apr 16 20:37:22 2010 -0400
+++ b/src/lepl/stream/filters.py Sun Apr 18 13:18:24 2010 -0400
@@ -215,6 +215,12 @@
else:
return self.__lookup[stream.character_offset]

+ def hash_line(self, line):
+ '''
+ Extract line number from original data.
+ '''
+ return self.location(0, line, line.location_state)[0]
+

class CachingFilteredSource(CachingTransformedSource):
'''
[diff ends here]

You can also get the correct code from
http://code.google.com/p/lepl/source/list

Unfortunately, I still have no idea what was causing the other issue reported,
with literal strigns not being coerced to matchers using Python 2.6 on a Mac.

Andrew
Reply all
Reply to author
Forward
0 new messages