Python?

Will Uther

unread,

May 7, 2011, 1:02:28 PM5/7/11

to gazelle-users

Hi,
As noted in my last message, I'm interested in Gazelle in the
context of syntax highlighting in an editor. I have previous
experience with the TextMate editor that has trouble parsing python.
I was wondering if an editor based on gazelle would have similar
issues...?

I did a quick search for python grammars, and they can be found
(e.g. http://www.antlr.org/grammar/1200715779785/Python.g ) but they
usually require some form of pre-processing to handle the whitespace
sensitivity. (That grammar requires the 'lexer' to include INDENT/
DEDENT symbols.)

Given that gazelle includes the lexer in the parser, is this sort of
special lexer possible? Can gazelle handle python? ( heh - this is
what comes up on a web search: http://www.youtube.com/watch?v=LDZwggWN_WY
)

Be well,

Will :-}

Joshua Haberman

unread,

May 9, 2011, 2:08:16 PM5/9/11

to gazelle-users

Hi Will,

This is a good question, and I don't have a good answer for it yet.
I'm pretty sure I thought hard about this at one point, but I don't
remember what conclusion I would have come to, and right now my mind
is totally full of upb-related issues, so most of my Gazelle-related
thoughts have been purged from my mind's cache. :)

I can definitely say that I want Gazelle to support parsing Python,
it's just a question of how. I'm driven by a mix of idealism and
pragmatism: it would be nice if Python's grammar could be described
directly, but if it needs to be preprocessed then I'm definitely
willing to go there to get Python supported.

On May 7, 10:02 am, Will Uther <will.ut...@gmail.com> wrote:
> heh - this is
> what comes up on a web search:http://www.youtube.com/watch?v=LDZwggWN_WY

Nice -- if I could avoid eating for 2 months, think how much more
programming I could get done. :)

Josh

Will Uther

unread,

May 24, 2011, 12:39:35 AM5/24/11

to gazelle-users

Hi again,

I was chatting with a few other people here about this problem
(finding a good, generic, parsing mechanism so that an AST can be
formed for many different languages). It was pointed out that not
only is the off-side rule (syntactically important indenting) quite
common and non-context-free, but C++ can be a nightmare, particularly
with the pre-processor.

For the off-side rule, I had a few thoughts. One path was
essentially reinventing the indexed grammar <http://en.wikipedia.org/
wiki/Indexed_grammar>. In its full generality that isn't efficient,
but it seems the right restricted form may be. I was viewing an
indexed grammar as a context-free grammar with backreferences, but
where the backreferences can also be passed as arguments to the non-
terminals. e.g.

Block(\indent) -> (\indent \w+) Statement(\1) (CR \1 Statement(\1))*
Statement(\indent) -> .... |
if expression : CR Block(\indent)

(Note that here Block(\indent) means a block indented by MORE THAN
\indent, where the Block nonterminal also consumes the space. In
contrast Statement(\indent) means a statement that follows an indent
of \indent, and the indenting is not part of the statement, but rather
is there in to be passed to any nested blocks.)

I don't know if this sort of back-referencing is easy to put into an
LL(*) parser. Full indexed grammars are hard, but this isn't a full
indexed grammar - it only allows back-references to appear as
arguments.

See also <http://danielmattosroberts.com/earley/context-sensitive-
earley.pdf>.

Cheers,

Will :-}

Reply all

Reply to author

Forward