Fairly recently I've been interested in using decompilation as a means for getting more exact location information of where a program is when it is stopped or hits an error. See
http://rocky.github.io/NYC-Hackntell/#/ for 5 minute talk I gave to a very general audience. (I bet you could get it in 3 minutes though).
Somewhat mirroring your experience with Perl (vs Python), I figured this out with the help of Perl Monks, and did this
first in Perl interprets via an AST tree that it creates.
Having mastered this aspect in Perl, I went on to Python, and that's where I discovered the AH Earley Algorithm parser from John Aycock. As that was a bit old and unmaintained, I even
packaged it for Python.
To get more broader interest in this I am in the process of submitting to conferences papers on this aspect.
Some questions
First there is the issue of whether I could use Marpa instead of Aycock/Horspool SPARK. Well, one handicap for me here is that I know the python parser and have mucked with it to give me debug and trace information that I so sorely need in debugging grammars.
What I'll probably do here is use Marpa in my Perl Debugger to parse the various command arguments of the "list", "disassemble", and various "break" commands. I currently have this done as custom code but I had promised users that I would follow
gdb conventions unless there was good reason not to. (Complicated code is no longer a good reason since this can be easily expressed in a grammar)
So back to the broader question of Marpa for Python. When I installed Marpa, it looks like it uses Bison/YACC and does a lot of work in C. This kind of thing seems like it is not going to be easy going.
There there is the broader question of how the Python decompiler currently works. At runtime, for each method custom rules are created for instructions that have variable length.
That is instead of
call_expr :: call | expr CALL | expr expr CALL | expr expr expr CALL | expr expr expr expr CALL
The operand of the CALL opcode gives how many parameters it takes. So a rule like:
call_expr ::= call_4
call_4 ::= expr expr expr expr CALL
is created when we see there is a call with 4 arguments. This cute idea goes back to the earliest versions of the decompiler by John Aycock. I can't find it mentioned in the literature though.
And I think somewhere I read that Marpa doesn't prefer to add rules at runtime per method. I suspect if the precompile phase or Marpa is fast enough this is moot.
In spark, I also added the ability to remove grammar rules since there is a lot of sharing of grammars going on between different Python versions.
Somewhere I read that the Earley parser is cubic. However ambiguous grammars can have an exponential number of derivations.
In particular consider this:
E ::= E E
E ::= a
Doesn't his amount to the number of ways you can parenthesize an expression of length n which is on the order of n choose 2 which is on the order of the upper part of a factorial which is exponential? Perhaps what they mean by cubic the time is cubic in the number of possible derivations although the number of derivations for a string of length n may be exponential?
One other technique of late in SPARK that I've been using is doing additional checks at grammar reduction time. For example two expressions might for some special kind of tuple if they they are expressions of constants of some kind. That kind of check might be too cumbersome to code in the grammar by changing opcodes.
Does Marpa allow for a programmable check to be made before performing a reduction?
Going the other way, may be beneficial: adding some Marpa goodness into the spark parser? Thoughts about that?