Am 06.03.2014 08:55, schrieb Aditya Shah:
> @Jo Parser generators sure exist. They take in grammar specs and generate
> parsers for that grammar. But the idea here is that we create our own
> custom generator framework which takes in a predefined type of
> rules(grammar) and then takes advantage of the similarities between the
> different Languages such as Mathematica or MathML to create a parser that
> parses the expression to sympy code.
That's going not to mix well with the ability to quickly pick up new
grammar rules as Mathematica or MathML define them.
I'd reuse grammar rules, and I'd make sure that all parsers emit the
same set of tree nodes so the same code generation can walk the tree,
but I wouldn't try to reuse handcrafted grammar parts - *particularly*
if you wish to improve parsing fidelity.
> Please take a look at the
> mathematica.py module in sympy/sympy/parsing folder. That is a parser for
> mathematica language. But it has had to be coded by hand.
Exactly.
> What I intend to
> implement is a program that takes in a few details about the differences
> between the language and sympy
The devil is exactly in the details.
The usual outcome of undertakings like this is that you get into
diminishing returns long before you're content with the results (or your
users are content with them).
That's the *usual* outcome. You may get lucky and find that the details
aren't that bad.
Also note that as soon as you start this specific kind of refactoring,
your code becomes more rigid. Adapting to changes now means not just
changing the specific input dialect, it may also require changes in the
refactoring.
Fred Brooks says that the overhead for writing refactored code is triple
that of writing the code directly, i.e. refactoring starts to become
useful if the same (kind of) code is used in more than three parsers,
*and you know that the factored-out code won't have to change ever again*.
There are a precious few abstractions that are general and
well-understood enough that they pay off even at smaller projects.
Parse trees are one of them.
Interleaving parse and generation... well, sort-of works, it's a bunch
of well-known techniques but they don't really lend themselves to
wrapping them up in a nice little library, these callback-based parsing
frameworks tend to get written over and over again because it's hard to
reuse the code. SAX parsers do that kind of thing, but notice how they
are restricted just to XML, they aren't generalized across languages.
(You should still take a look at a typical SAX API for ideas how to
structure such an interface.)
Trying to factor out from hand-written parsers is not an abstraction
that will pay off, unless you are extremely lucky.
> and automatically generates the code that
> converts the expression.
Code generation is straightforward once you have a working parser, so
that's an aspect that probably doesn't need discussion.
> Please do note that here the term "parser" that I
> am referring to is not the exact "parser" that we have for other languages.
> It is more of an interpreter sort of thing and I want to make the program
> that creates those interpreters.
Um. Okay. Write "interpreter" then...
... though, we have been discussing parser aspects.
You plan to use a hand-written one; my advice is to stay away from that
route because maintaining a hand-written parser with an
occasionally-changing syntax means that all the clever shortcuts you
took will some day stop working.
Factoring out is one such clever shortcut, applied systematically.
I'm not saying that you will fail.
I'm just saying that you're running a considerable risk of failure here.
I'm also saying that the more languages SymPy supports using
hand-written parsers, the higher the maintenance overhead will become.