help with liftoff please

20 views
Skip to first unread message

Aaron Watters

unread,
Nov 21, 2011, 3:24:16 PM11/21/11
to lepl
Lepl looks really cool, but I just can't seem to get off the ground
here.


aaron-mac:lepl aaron$ more ptest.py

from lepl import *

namere = '[a-z]+'
symbol = Token('[^0-9a-zA-Z \t\r\n]')
name = Token(namere)

with DroppedSpace(spaces):
#namelist = Delayed()
namelist = name & symbol('+') & name

print name.parse("hello")
print namelist.parse("hello")
print namelist.parse("hello+goodbye")


aaron-mac:lepl aaron$ python ptest.py
Traceback (most recent call last):
File "ptest.py", line 8, in <module>
with DroppedSpace(spaces):
NameError: name 'spaces' is not defined
aaron-mac:lepl aaron$

What am I doing wrong? Sorry, but it wasn't obvious to me from the
documentation.

btw: Eventually I would like to encode a significant fragment of
something like this: http://savage.net.au/SQL/sql-92.bnf
-- is that a reasonable goal?

andrew cooke

unread,
Nov 21, 2011, 3:43:24 PM11/21/11
to le...@googlegroups.com
Hi,

The problem is more a Python programming one than connected to Lepl - you have used the name "spaces" as an argument to DroppedSpace(), but that is not defined.

I am not sure what you were trying to do there - DroppedSpace() doesn't require an argument.  I had a quick look at the docs, but couldn't see what would have suggested that you place something there.

So anyway, try your code with just "with DroppedSpace():" and that error should go away (I haven't tested it; there may be other issues as well, but that is the obviously wrong thing here).

As for parsing SQL in Lepl... it is certainly possible, but there are some issues.  First, Lepl is pure-Python so it is not very fast.  If you are planning to use this to parse a lot of input data then a C or Java based parser would be faster.  Second, just being realistic (please don't take this as personal criticism), writing a SQL parser would be quite an ambitious project, and it sounds like you're still learning, so it will be a lot of work (on the other hand, just trying it, even if you don't finish, will teach you a lot...).

Cheers,
Andrew

Aaron Watters

unread,
Nov 21, 2011, 3:55:16 PM11/21/11
to lepl

whoops that was because of a last minute edit. How embarrassing.

Here is the actual error after deleting the nameerror as you suggest.

And I've written an SQL parser before ;c). http://gadfly.sourceforge.net/
thanks, -- Aaron Watters

['hello']


Traceback (most recent call last):

File "ptest.py", line 13, in <module>
print namelist.parse("hello")
File "build/bdist.macosx-10.6-universal/egg/lepl/core/config.py",
line 858, in parse
File "build/bdist.macosx-10.6-universal/egg/lepl/core/config.py",
line 815, in get_parse
File "build/bdist.macosx-10.6-universal/egg/lepl/core/config.py",
line 723, in get_match
File "build/bdist.macosx-10.6-universal/egg/lepl/core/config.py",
line 675, in _raw_parser
File "build/bdist.macosx-10.6-universal/egg/lepl/core/parser.py",
line 220, in make_raw_parser
File "build/bdist.macosx-10.6-universal/egg/lepl/lexer/
rewriters.py", line 122, in __call__
File "build/bdist.macosx-10.6-universal/egg/lepl/lexer/
rewriters.py", line 76, in find_tokens
lepl.lexer.support.LexerError: The grammar contains a mix of Tokens
and non-Token matchers at the top level. If Tokens are used then non-
token matchers that consume input must only appear "inside" Tokens.
The non-Token matchers include: Any(' \t').
aaron-mac:lepl aaron$

andrew cooke

unread,
Nov 21, 2011, 5:05:27 PM11/21/11
to le...@googlegroups.com

Ah, OK.

So, the error is not very good in this case - sorry - as it's not clear where
the conflict is coming from. What is happening is that, "behind the scenes",
DroppedSpace is adding additional matchers (to handle spaces) that are
causing the error you are seeing.

The underlying issue is that you're mixing two different ways of handling
spaces. If you use tokens, then you should drop spaces in the tokenizer. If
you don't use tokens, then you use DroppedSpace. Using both at the same time
doesn't make much sense.

Tokens (ie the lexer) are explained at
http://www.acooke.org/lepl/intro-2.html#tokens-first-attempt and
http://www.acooke.org/lepl/lexer.html#lexer

So, which should you use?

In general, if you have a target that can be handled by Lepl's simple
regular-expression based lexer, then using it simplifies the grammar. But
note that it is a completely separate layer, so any lexing is restricted to
regular expressions. If you have anything "fancy" (eg nested comments or
strings with anything-but-very-simple escapes for quoting) that requires
context information during lexing then you cannot use the lexer (well, you
can, but you will end up frustrated later) (SQL might be simple enough - I
don't know enough about exactly what a literal SQL string can look like to
say).

Finally, after writing the other reply, I was thinking more about SQL-related
issues. Lepl is generally used for small "pull data out of this mess"
problems. It is not used - to my knowledge - for parsing large languages.
Theoretically, there are no limits, but practically you may hit some issues.
One is the weak lexer (see above!). Another is error handling.

Part of the problem with errors is efficiency. Lepl's recursive descent
nature means that when input contains an error it tends to spend a lot of time
backtracking to find a "solution" that doesn't exist. It is, in a sense, too
flexible. The First() matcher can help avoid this.

Another part is the informality of the parser ("failure" is often a normal
condition that simply means a different option should be tried via
backtracking - how do you separate this from a "real" error? (*)) and a lack
of experience with big projects. You may find, as you write a parser for SQL,
that there is some useful abstraction that Lepl doesn't have, that would make
handling (specifying, in a sense) errors easier. That would be interesting,
and because Lepl is written in Python you could extend it to add that
abstraction, but it could also mean more work...

Hope that helps.

Andrew

(*) One thing Lepl does has, which can help, is that it tracks the position of
the deepest match within the text. Typically this is very close to where the
error is, which makes the information very useful. What it doesn't do (and I
am not sure how you would do this, but it might be interesting to explore) is
associate that with any kind of metadata about "where in the grammar" it was
when it reached that point.

> --
> You received this message because you are subscribed to the Google Groups "lepl" group.
> To post to this group, send email to le...@googlegroups.com.
> To unsubscribe from this group, send email to lepl+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lepl?hl=en.
>

Reply all
Reply to author
Forward
0 new messages