Question on Matchers and Separators

2 views

Skip to first unread message

LogicProgrammer

unread,

Nov 17, 2010, 1:20:58 PM11/17/10

to lepl

Hi All,

Consider the following grammar:

LPAREN = Token(r"\(")
RPAREN = Token(r"\)")
COMMA = Token(r",")
PERIOD = Token(r"\.")

OBJECT_CONSTANT = Or(
Token('[a-z][a-zA-Z0-9]*'),
Token('[1-9][0-9]*'),
Token('0'),
)

VARIABLE_CONSTANT = Token('[A-Z][a-zA-Z0-9]*')
FUNCTION_SYMBOL = Token('[a-z][a-zA-Z0-9]*')
PREDICATE_SYMBOL = Token('[a-z][a-zA-Z0-9]*')

TERM = Delayed()
TERMS = Delayed()

TERM += Or(
FUNCTION_SYMBOL & LPAREN & TERMS & RPAREN > ''.join,
OBJECT_CONSTANT > ''.join,
VARIABLE_CONSTANT > ''.join,
)

TERMS += TERM & ZeroOrMore(COMMA & TERM)

Using this grammar I obtain the following:

>>> TERM.parse('p(x)')
[u'p(x)']
>>> TERM.parse('p ( x ) ')
[u'p(x)']

This seems to violate my intuition of what should be going on. On
reading further, I understand that the use of & will ignore separating
whitespace. This is not my intent however, as 'p(x)' should be a valid
term, but 'p ( x ) ' should not be considered valid. Is there any way
to specify this with LEPL?

Thank you kindly in advance,
Gregory Gelfond

andrew cooke

unread,

Nov 17, 2010, 1:35:39 PM11/17/10

to le...@googlegroups.com

Hi,

There are two ways to do what you want.

First, you could not use Tokens. It is possible (normal?) for Lepl to
match against a stream of characters with no intermediate lexer. This is
what much of the documentation describes. So you could replace "Token"
below with "Regexp" and it would, I think, work as you expect (although I
haven't actually tried it).

Alternatively, you can continue to use Tokens (which implies that the
input stream is passed through a lexer before being matched) but you need
to disable the automatic matching (and discarding) of whitespace in the
lexer. You can do that by specifying "discard" to config.lexer (this is
the regexp that matches and discards data if other tokens fail). So you
could add at the end of your code
TERM.config.lexer(discard=r'')
(again, I haven't tried this, and you may need to check the details in the
docs).

The main difference between these two is the error you will get - in the
first case the matcher will fail; in the second the lexer will fail.

Hope that helps (apart from the above you look to be doing things right!)

Cheers,
Andrew

Reply all

Reply to author

Forward

0 new messages