GSOC 2014 idea: Adding parsing functionality to Sympy Live

Aditya Shah

unread,

Feb 16, 2014, 11:43:05 AM2/16/14

to sy...@googlegroups.com

Hi,

I am Aditya Shah and I am a third year Computer SCience student at BITS-Pilani university. I would like to work with Sympy for GSOC. I had previously posted on this mailing list regarding my willingness to implement the group theory module for Sympy. While scrolling through the ideas list, I came upon the idea to improve the parser for Sympy Live. I have a small background in parsing and natural language processing, since I have done projects on those topics for my college course work. Can anyone please tell me how much work is done on parsers, and what needs to be implemented further?

@ProspectiveMentor: Please reply to this post so that I can discuss further regarding the topic.

Github profile: https://github.com/adityashah30/

IRC: adityashah30

Thanks,

Aditya Shah

Aaron Meurer

unread,

Feb 16, 2014, 1:41:23 PM2/16/14

to sy...@googlegroups.com

All the current code in in SymPy, in sympy/parsing. You should think
about how to structure the parsing module, so that it is extensible
enough to handle many different kinds of inputs (LaTeX, mathematica,
natural language, etc.).

Aaron Meurer

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> For more options, visit https://groups.google.com/groups/opt_out.

Aditya Shah

unread,

Feb 17, 2014, 11:22:32 PM2/17/14

to sy...@googlegroups.com

Thanks for the reply Aaron. I am really quite new to application and software development, so can you please tell me more about Sympy's requirement about a successful PR to participate in GSOC?

Thanks,

Aditya Shah

Sachin Joglekar

unread,

Feb 18, 2014, 5:45:05 AM2/18/14

to sy...@googlegroups.com

It's not necessary to have a 'successful' (by that I assume you mean a merged) PR to become a part of GSoC, though its good to have that. Basically, we need to be sure about you being good with the theory behind the project (may reflect in your former work and proposal), and that you know atleast the fundamentals that are needed, like git and the basic SymPy workflow (pushing commits, adding tests, PEP8 conventions etc). You obviously can learn these during the GSoC/Community bonding period, but knowing them beforehand is a sure plus. However, SymPy does require you to atleast have a PR in the pipeline to be considered for selection as a GSoC student.

Aditya Shah

unread,

Feb 18, 2014, 5:51:48 AM2/18/14

to sy...@googlegroups.com

Thanks for the reply Sachin. However I am quite unsure about the nature of the PR. Is is supposed to be a patch or introduction of some new functionality or something else altogether? Can you please clarify on that matter?

Thanks,

Aditya Shah

Sachin Joglekar

unread,

Feb 18, 2014, 5:59:33 AM2/18/14

to sy...@googlegroups.com

Theres no restriction on that as such. Its basically just for the organization people to see that you are okay with the basics and are comfortable (atleast somewhat) with the SymPy codebase. Whether its a bug fix or functionality addition is upto you - as long as you add/modify the relevant docstrings (if needed), add tests and stick to the coding styles. Reviewers can help you with this, once you send a PR.

Aditya Shah

unread,

Feb 18, 2014, 6:29:56 AM2/18/14

to sy...@googlegroups.com

Thanks Sachin.

Aaron Meurer

unread,

Feb 18, 2014, 6:20:26 PM2/18/14

to sy...@googlegroups.com

No, this is incorrect. You must have at least one PR *pushed into the
codebase*. See https://github.com/sympy/sympy/wiki/gsoc-2014-application-template.

Perhaps you are confused by the deadline mismatch. The deadline for
opening the PR is the same as the student application deadline (March
21), and the deadline for that pull request to be merged is the same
as the date that Google announces accepted students (April 7). This is
done because actual merging depends on reviewing manpower, which is
sometimes lacking.

But if you start now, you should find it easy to do both before either
of those days.

Aaron Meurer

Aditya Shah

unread,

Feb 18, 2014, 9:43:51 PM2/18/14

to sy...@googlegroups.com

Thanks for the clarification Aaron. I'll keep the deadlines in mind.

Aditya Shah

unread,

Feb 21, 2014, 12:39:40 PM2/21/14

to sy...@googlegroups.com

Hey Aaron, I have solved the issue 1160 mentioned at

https://code.google.com/p/sympy/issues/detail?id=1160&q=label%3AParsing&colspec=ID%20Type%20Status%20Priority%20Milestone%20Reporter%20Summary%20Stars

Please can you tell me the exact procedure to push the patch?

Thanks,

Aditya Shah

unread,

Feb 21, 2014, 1:18:33 PM2/21/14

to sy...@googlegroups.com

Btw i just submitted a PR at

https://github.com/sympy/sympy/pull/2947

Is that it or is anything missing?

Thanks,

Aditya Shah

unread,

Feb 22, 2014, 10:19:21 PM2/22/14

to sy...@googlegroups.com

I would like to discuss my plan of action to develop the general parsing framework for Sympy. Right now the code is quite messy. The modules for different language extensions such as Mathematica or MathML are implemented quite separately and there exists little common between them. Also, the parsing is done via a simple implementation of heuristics that cover the most common expressions. So as a solution to all this, I propose this. Create a language recognizer module that takes the input string and then based on its probability to belong to different language groups, we decide in the end to which language does it belong to. After a successful identification, we can then assign the string to the appropriate module. Since every language too will have subdivisions in them (E.g. in Sympy we have assumptions, calculus etc), we then decide which parts of the string belong to which part of the language spec. After that we can parse the string according to rules specified in the module and make it Sympy compliant. Also as a part of this project, I would like to implement the parser for Latex and rudimentary natural language.

Please provide feedback regarding this idea.

Thanks,

Aditya Shah

Christophe Bal

unread,

Feb 23, 2014, 8:19:13 AM2/23/14

to sympy-list

Hello.

How will you evaluate thé probability of à language?

Christophe, a simple user.

--

Sachin Joglekar

unread,

Feb 23, 2014, 8:26:09 AM2/23/14

to sy...@googlegroups.com

I have concerns similar to Christophe's. I haven't done work on the parsing module, so cannot comment on that part. However, a natural-language processing toolkit (English, for now) for SymPy may need extensive NLP, which I don't think belongs in SymPy. However, you _can_ think of building something like that as an add-on to SymPy, as a whole..similar to WolframAlpha's capabilties, at the basic level.

Aditya Shah

unread,

Feb 23, 2014, 8:43:52 AM2/23/14

to sy...@googlegroups.com

Hi Christophe,

I would like to illustrate the process with an example. If the input string is "Sqrt[x]", since such a format for square root function belongs only in Mathematica spec, we can then proceed to convert the string to its sympy equivalent. If the string is "\sqrt[x]", the leading "\" distinguishes the string to belong to the Latex spec.

What my point is that we can pre-analyze the commonly occurring keywords and functions such as sqrt or sin and make a list of them for the language for which you want to write the parser. Then when the input string comes, we can match the string to the lists and decide on the language after which it can be passed to a relevant parser.

I propose this because this framework can be extended. All the developer has to do is write a list of commonly occurring words(keywords and common functions) and also the function style( [] vs () ) and then can proceed to write the parser for the language. This way the parser module in sympy can be made modular and extensible as suggested by Aaron previously.

So the structure will look like:

Language Recognizer ====> Relevant Language Parser ====> FInal sympy parser to evaluate the expressions.

Thanks,

Aditya Shah

unread,

Feb 23, 2014, 8:46:25 AM2/23/14

to sy...@googlegroups.com

Hi Sachin,

While I do agree that inclusion of NLP parser would be a big project in itself. But if implemented even as an add-on, it can be used to augment the capabilities of Sympy Live.

Btw just a quick question: Can such a project be considered for GSOC by sympy?

Thanks,

Aditya Shah

Christophe Bal

unread,

Feb 23, 2014, 9:02:16 AM2/23/14

to sympy-list

Thanks for the explanations.

--

Aditya Shah

unread,

Feb 23, 2014, 1:44:34 PM2/23/14

to sy...@googlegroups.com

@asmeurer, @srjoglekar246, @skirpichev. Please review the idea and comment upon it so that a discussion can ensue.

Aaron Meurer

unread,

Feb 23, 2014, 2:44:40 PM2/23/14

to sy...@googlegroups.com

Language heuristics seem like a waste of time to me. The user can just
input what language the expression is. If we need to guess (like in
SymPy Gamma), we can just try parsing with all the parsers and see
which ones worked.

The real issue is how to unify the disjoint code, as you pointed out,
so that there is not wasted effort writing a completely new parser for
each language.

Aaron Meurer

On Sun, Feb 23, 2014 at 12:44 PM, Aditya Shah <aditya...@gmail.com> wrote:
> @asmeurer, @srjoglekar246, @skirpichev. Please review the idea and comment
> upon it so that a discussion can ensue.
>

Message has been deleted

Aditya Shah

unread,

Feb 23, 2014, 10:38:22 PM2/23/14

to sy...@googlegroups.com

Then I have another idea in mind. A language specifier configuration file can be given as input by the developer and sympy takes care of the rest. It's quite like yacc. This would essentially reduce the workload of the developer substantially regarding the development of a new parser.

Aditya Shah

Sergey Kirpichev

unread,

Feb 24, 2014, 4:58:40 AM2/24/14

to sy...@googlegroups.com

On Monday, February 24, 2014 7:38:22 AM UTC+4, Aditya Shah wrote:

Then I have another idea in mind. A language specifier configuration file can be given as input by the developer and sympy takes care of the rest. It's quite like yacc. This would essentially reduce the workload of the developer substantially regarding the development of a new parser.

What if there is no language specification, like for Mathematica?

Aditya Shah

unread,

Feb 24, 2014, 5:07:42 AM2/24/14

to sy...@googlegroups.com

Not exact specification, but there is a proper syntax definition. Please take a look here:

https://reference.wolfram.com/mathematica/tutorial/TheSyntaxOfTheMathematicaLanguage.html

So here what differs is that all the inbuilt functions start with a Capital letter, Also inverse trignometric functions have 'Arc' instead of 'a' in case of sympy. The functions use [] instead of () to take in args. And as Aaron mentioned, the user can specify from which language he needs to parse his string, so there is no need for language detection. I can think on the most commonly used traits while determining the parsed string. So the language parser would simply convert the string to a python/sympy equivalent and its evaluation will then be done my another module similar to sympify.

Aditya Shah

unread,

Feb 24, 2014, 9:22:16 PM2/24/14

to sy...@googlegroups.com

Can anyone please comment on the feasibility of my idea stated above?

Thanks,

Aditya Shah

Aaron Meurer

unread,

Feb 25, 2014, 9:06:24 PM2/25/14

to sy...@googlegroups.com

Can you elaborate on the idea a little more explicitly. How would it
parse ArcSin[Sqrt[x]] for example?

Aaron Meurer

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.

> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/bd57dbd8-39fc-4963-886a-bec45bc96c2c%40googlegroups.com.

Aditya Shah

unread,

Feb 25, 2014, 9:51:44 PM2/25/14

to sy...@googlegroups.com

As you have stated previously, the framework need not detect the language that it is parsing from and the user provides input regarding that. Now, since this is a Mathematica string, we directly specify it to be so. In the specification for Mathematica, we define that functions take arguments via [ ] instead of the normal ( ). Also, we define that the built in functions start with a capital letter and so their Sympy equivalent includes converting them to lower case. And lastly the information about the inverse trignometric function is given telling the framework to drop the 'Arc' and add 'a' as a suffix to the function name. As for the framework, once the specification is given we can construct a form of Earley parser to parse the strings.

So in this case, the flow will be something of the form:

ArcSin[Sqrt[x]] ==> asin(Sqrt[x]) ==> asin(sqrt(x)) (which is the final string)

So actually there would be a module that will generate a parser given a specification. The actual parsing takes place using the generated parser.

I think that Sympy needs a proper parsing framework so that it can be extended very easily to other languages. I will work on the exact details of the specification file (what input should be taken from the user regarding the specification of the new language).

Aditya Shah

unread,

Feb 28, 2014, 5:41:30 AM2/28/14

to sy...@googlegroups.com

@asmeurer @skirpichev So how do you feel about the above mentioned idea? A generic parsing framework for Sympy to facilitate extending Sympy to similar Math Spec Languages?

Ondřej Čertík

unread,

Mar 4, 2014, 11:30:18 PM3/4/14

to sympy

On Tue, Feb 25, 2014 at 7:51 PM, Aditya Shah <aditya...@gmail.com> wrote:
> As you have stated previously, the framework need not detect the language
> that it is parsing from and the user provides input regarding that. Now,
> since this is a Mathematica string, we directly specify it to be so. In the
> specification for Mathematica, we define that functions take arguments via [
> ] instead of the normal ( ). Also, we define that the built in functions
> start with a capital letter and so their Sympy equivalent includes
> converting them to lower case. And lastly the information about the inverse
> trignometric function is given telling the framework to drop the 'Arc' and
> add 'a' as a suffix to the function name. As for the framework, once the
> specification is given we can construct a form of Earley parser to parse the
> strings.
>
> So in this case, the flow will be something of the form:
>
> ArcSin[Sqrt[x]] ==> asin(Sqrt[x]) ==> asin(sqrt(x)) (which is the final
> string)

Is this different to what we have already here:

https://github.com/sympy/sympy/blob/master/sympy/parsing/mathematica.py

>
> So actually there would be a module that will generate a parser given a
> specification. The actual parsing takes place using the generated parser.
>
> I think that Sympy needs a proper parsing framework so that it can be
> extended very easily to other languages. I will work on the exact details of
> the specification file (what input should be taken from the user regarding
> the specification of the new language).

I agree that it would be nice to be able to parse general things, e.g.
things like
"plot x^2 from x=1 to 10" and it would work like here:

http://www.wolframalpha.com/input/?i=plot+x%5E2+from+x%3D1+to+10

But I don't know how difficult this is (I don't have much experience
in this field), I suspect this is a huge undertaking.

Ondrej

Aditya Shah

unread,

Mar 5, 2014, 12:48:48 AM3/5/14

to sy...@googlegroups.com

@Certik Thanks a lot for replying. Well, as to your first question, I intend to develop a framework that can generate a parser. What we have in mathematica.py is a parser. So I want to take that one step further and devise a standard mechanism by which any developer can extend the enormous power of sympy by specifying a few things(the details of which I am still working on). Also as to your second question, it involves the use of NLP(Natural Language Processing). In that we case, we'll have to use some NLP framework such as NLTK or Gensim to process the inbound query. Although, I can think of a very rudimentary structure in NLP queries such as every query has to be composed of 3 parts:

1. An action. (Essentially the action maps to a function in sympy. The action in your example is "plot")

2. A function (The specified action has to be applied on a something, right? That is the function part. The function in your example is x^2)

3. Limits (Optional)(Since Sympy is largely symbolic in nature, limits may or may not have any significance to the computation. The limits in your example being x= 1 to 10)

And to your last question I would like to say that in one sense it is quite a big undertaking. This semester I have a course called "Principles of Compiler construction". After learning about the various parsing techniques used by the standard compilers, I think that we can apply the very same techniques to other Math spec languages too and develop a Parsing framework for sympy.

Thanks,

Aditya

Aditya Shah

unread,

Mar 5, 2014, 2:09:09 AM3/5/14

to sy...@googlegroups.com

I have developed a raw block diagram and an API to explain my concept. It goes as follows:

Suppose, we want to write a parser for the MathML Language. We need to create 2 things:

1. A Spec File (this contains the mappings between MathML features and corresponding sympy features (although not all)).

2. A Parser Generator File (This file contains references to the Spec File and the Parser Generator Framework(PGF)(which I intend to create).

Spec File ==> Parser Generator Program <== Parser Generator Framework(PGF)

||

V

<Name of language>.py + Entry updated in the parsers.py file

On executing this Parser generator file, for our example we get a parser program called mathml.py

This file is placed in the sympy/sympy/parsing/parsers folder.

A file exists in /sympy/sympy/parsing/ called parsers.py which contains the mappings of name of languages and corresponding parsers found in the ./parsers/ folder.

So we also update the code in the parsers.py to include the mathml language parser.

Now a user who wants to convert a MathML string to its sympy equivalent, does the following steps:

a = parsers.convert_sympy(s, lang='mathml')

This statement invokes a function convert_sympy in the parsers.py which matches the parser to be invoked as mathml.py. It passes the string s to the parser mathml.py. After the necessary transformations, it is passed on to sympify module in sympy/sympy/core/ folder. The final output is returned and stored in a which is now a pure sympy expression.

Please provide feedback on this idea and point out any mistakes or inefficiencies in case you discover any.

Thanks,

Aditya

Ondřej Čertík

unread,

Mar 5, 2014, 11:30:18 AM3/5/14

to sympy

For parsing, will this work to create robust parsers for all inputs
out there, like Mathematica,
MathML, etc? Because each is a completely different language, it's not
just creating a simple
table of corresponding symbols like ArcSin <-> asin, you have to
understand loops and other
language specific details of Mathematica, and those are unique for
each language.

As to natural processing, if you have experience in it, feel free to
draft a proposal, so that we
can discuss particular details.

Ondrej

Aditya Shah

unread,

Mar 5, 2014, 12:10:40 PM3/5/14

to sy...@googlegroups.com

@Certik Thanks for replying. I think that we might be able to make quite a robust parser after all. Well I do agree that it may escape some very peculiar use cases but still according to me such a parser (and quite fully functional at that) is better than having no parser.

I am currently pursuing a project in NLP for my coursework and I do have a decent experience in the area. So will that be considered as a new proposal or a continuation of this proposal itself?

Thanks,

Aditya

Ondřej Čertík

unread,

Mar 5, 2014, 12:46:52 PM3/5/14

to sympy

I would suggest you write up your ideas into one proposal on our wiki
and then invite
people here to comment on it, and you can always split it into two
proposals if you want.

Ondrej

Aditya Shah

unread,

Mar 5, 2014, 12:50:31 PM3/5/14

to sy...@googlegroups.com

@Certik. Sure thing, I'll draft a proposal on the wiki.

Thanks,

Aditya

Joachim Durchholz

unread,

Mar 5, 2014, 1:52:01 PM3/5/14

to sy...@googlegroups.com

Am 05.03.2014 18:10, schrieb Aditya Shah:
> @Certik Thanks for replying. I think that we might be able to make quite a
> robust parser after all. Well I do agree that it may escape some very
> peculiar use cases but still according to me such a parser (and quite fully
> functional at that) is better than having no parser.

Sorry for chiming in very late (and I have to admit I haven't read the
whole thread).

Just be warned: From the programming language design community, it's a
well-known fact that people tend to underestimate the relevance of "some
very peculiar use cases".

It works like this: The larger the input is, the more likely it is that
one of these use cases does indeed occur. If a not-covered use case
occurs only with a probability of 1% per line of input, this means a
failure every 61(!) lines of input.

On parser generators: These can give you a running start, which is a
good thing, but the real work tends to begin after that.
The real challenge is to make that additional work flexible enough that
you don't have to redo it 100% whenever the language you parse gets changed.

This all doesn't mean it's a bad idea to do; in fact better parsing
support would be an overall improvement. But be aware that it might not
work well enough in the end, or require too much work to work well enough.

Just so you know what you're trying :-)

Regards,
Jo

Aditya Shah

unread,

Mar 5, 2014, 2:06:24 PM3/5/14

to sy...@googlegroups.com

@Joachim Durchholz, thanks a lot for your warnings. I do understand your concerns but I think that I will be able to create the desired thing. I do admit that the final product may contain a few bugs but I will try to keep it as bug free as possible

And BTW, for the parser structure I intend to use "Compilers" by "Aho, Lam, Sethi, Ullman" as a reference book(popularly known as "The Dragonbook"). If anyone can suggest a better reference material please comment below.

Thanks,

Aditya Shah

Joachim Durchholz

unread,

Mar 5, 2014, 3:48:16 PM3/5/14

to sy...@googlegroups.com

Am 05.03.2014 20:06, schrieb Aditya Shah:
> And BTW, for the parser structure I intend to use "Compilers" by "Aho, Lam,
> Sethi, Ullman" as a reference book(popularly known as "The Dragonbook").

Oh, that's *ancient*.

> If
> anyone can suggest a better reference material please comment below.

Well, "Parsing Techniques - A Practical Guide", by Dick Grune and Ceriel
Jacobs, was mostly cited ten years ago (on comp.compilers anyway).
If you can read German, try Sönke Kannapinn's thesis "Eine
Rekonstruktion der LR-Theorie zur Elimination von Redundanz - mit
Anwendung auf den Bau von ELR-Parsern". If I were to implement a parser
generator, I'd use that - it's the most powerful LR technique where
error messages can be easily linked to grammar rules.

There's a big practial problem with most parser generators: You need to
adapt the grammar so the generator can work with it. Sometimes this
means that the published grammar must be rewritten from scratch. If the
language changes (because the language designer decides it would be nice
to have a new feature), you may have to rewrite the grammar again. This
can become very, very unmanageable.

The one exception is GLR (or "Earley") parsers. The parser generators
accept arbitrary context-free grammars.
The downside is that these parsers will not give you an easy way to
generate meaningful error messages. They'll simply give you zero, one,
or more parses for an input, and that's it. (Well, that was the status
ten, twenty years ago. Things might have improved.)

Finally, please consider what you actually want: Do you want to write a
parser generator, or do you want to write a parser?
The former is quite outside the scope of SymPy.
The latter means you want to *use* a parser generator, not *write* one.
So I'd google for
Python "parser generator"
and pick the one that looks like it's easiest to use.

Just my 2c, hope it helps.

Regards,
Jo

Joachim Durchholz

unread,

Mar 5, 2014, 4:07:44 PM3/5/14

to sy...@googlegroups.com

Am 05.03.2014 21:48, schrieb Joachim Durchholz:
> Am 05.03.2014 20:06, schrieb Aditya Shah:
>
> > If
>> anyone can suggest a better reference material please comment below.

I forgot: GLR-style parsers exist in two varieties, early "Earley" style
and newer, "Tomita" style. Tomita is better than Earley in most respects
but a bit more complicated.
The reference work a few years ago was "Tomita-Style Generalised LR
Parsers", by Scott/Johnstone/Hussain.

Aditya Shah

unread,

Mar 5, 2014, 9:26:09 PM3/5/14

to sy...@googlegroups.com

@Jo Thank you, that was quite enlightening. Now as to the parsers, they are not exactly parsers. We do have rudimentary parsers for Mathematica and Maxima in sympy right now. If you take a look at their code, you can see that they are not CFGs but simple RE rules. They perform very good under almost all the circumstances (although I encountered a bug in the Mathematica module and fixed it). The point is that this functionality allows us to embed small snippets already written in other languages. So the aim here right now is to generate a parser(or a converter as you may call it) that converts that snippet to equivalent python/sympy code. After that is successfully done, we can move onto a generic parser framework that can convert entire programs to sympy equivalent code. And I was going for a parser generator framework for sympy that generates a parser, not just a parser itself.

Thanks,

Aditya

Joachim Durchholz

unread,

Mar 6, 2014, 1:12:39 AM3/6/14

to sy...@googlegroups.com

Am 06.03.2014 03:26, schrieb Aditya Shah:
> So
> the aim here right now is to generate a parser(or a converter as you may
> call it) that converts that snippet to equivalent python/sympy code. After
> that is successfully done, we can move onto a generic parser framework that
> can convert entire programs to sympy equivalent code.

Code generation is independent of the parsing technology chosen.

> And I was going for a
> parser generator framework for sympy that generates a parser, not just a
> parser itself.

Why?
Parser generators already exist.

Aditya Shah

unread,

Mar 6, 2014, 2:55:39 AM3/6/14

to sy...@googlegroups.com

@Jo Parser generators sure exist. They take in grammar specs and generate parsers for that grammar. But the idea here is that we create our own custom generator framework which takes in a predefined type of rules(grammar) and then takes advantage of the similarities between the different Languages such as Mathematica or MathML to create a parser that parses the expression to sympy code. Please take a look at the mathematica.py module in sympy/sympy/parsing folder. That is a parser for mathematica language. But it has had to be coded by hand. What I intend to implement is a program that takes in a few details about the differences between the language and sympy and automatically generates the code that converts the expression. Please do note that here the term "parser" that I am referring to is not the exact "parser" that we have for other languages. It is more of an interpreter sort of thing and I want to make the program that creates those interpreters.

Regards,

Aditya

Joachim Durchholz

unread,

Mar 6, 2014, 4:01:13 AM3/6/14

to sy...@googlegroups.com

Am 06.03.2014 08:55, schrieb Aditya Shah:
> @Jo Parser generators sure exist. They take in grammar specs and generate
> parsers for that grammar. But the idea here is that we create our own
> custom generator framework which takes in a predefined type of
> rules(grammar) and then takes advantage of the similarities between the
> different Languages such as Mathematica or MathML to create a parser that
> parses the expression to sympy code.

That's going not to mix well with the ability to quickly pick up new
grammar rules as Mathematica or MathML define them.

I'd reuse grammar rules, and I'd make sure that all parsers emit the
same set of tree nodes so the same code generation can walk the tree,
but I wouldn't try to reuse handcrafted grammar parts - *particularly*
if you wish to improve parsing fidelity.

> Please take a look at the
> mathematica.py module in sympy/sympy/parsing folder. That is a parser for
> mathematica language. But it has had to be coded by hand.

Exactly.

> What I intend to
> implement is a program that takes in a few details about the differences
> between the language and sympy

The devil is exactly in the details.
The usual outcome of undertakings like this is that you get into
diminishing returns long before you're content with the results (or your
users are content with them).
That's the *usual* outcome. You may get lucky and find that the details
aren't that bad.

Also note that as soon as you start this specific kind of refactoring,
your code becomes more rigid. Adapting to changes now means not just
changing the specific input dialect, it may also require changes in the
refactoring.
Fred Brooks says that the overhead for writing refactored code is triple
that of writing the code directly, i.e. refactoring starts to become
useful if the same (kind of) code is used in more than three parsers,
*and you know that the factored-out code won't have to change ever again*.

There are a precious few abstractions that are general and
well-understood enough that they pay off even at smaller projects.
Parse trees are one of them.
Interleaving parse and generation... well, sort-of works, it's a bunch
of well-known techniques but they don't really lend themselves to
wrapping them up in a nice little library, these callback-based parsing
frameworks tend to get written over and over again because it's hard to
reuse the code. SAX parsers do that kind of thing, but notice how they
are restricted just to XML, they aren't generalized across languages.
(You should still take a look at a typical SAX API for ideas how to
structure such an interface.)

Trying to factor out from hand-written parsers is not an abstraction
that will pay off, unless you are extremely lucky.

> and automatically generates the code that
> converts the expression.

Code generation is straightforward once you have a working parser, so
that's an aspect that probably doesn't need discussion.

> Please do note that here the term "parser" that I
> am referring to is not the exact "parser" that we have for other languages.
> It is more of an interpreter sort of thing and I want to make the program
> that creates those interpreters.

Um. Okay. Write "interpreter" then...

... though, we have been discussing parser aspects.
You plan to use a hand-written one; my advice is to stay away from that
route because maintaining a hand-written parser with an
occasionally-changing syntax means that all the clever shortcuts you
took will some day stop working.
Factoring out is one such clever shortcut, applied systematically.

I'm not saying that you will fail.
I'm just saying that you're running a considerable risk of failure here.
I'm also saying that the more languages SymPy supports using
hand-written parsers, the higher the maintenance overhead will become.

Aditya Shah

unread,

Mar 6, 2014, 2:47:53 PM3/6/14

to sy...@googlegroups.com

@Jo Well I am still unconvinced of your opinion that such the strategy that I intend to adopt will fail in the long run. I'll give you my reasons for it. Firstly, we are just talking about Math Spec Languages not generalized programming languages with complex rules. I have noticed that the rules of different Math Spec Languages tend to be quite similar differing only in the syntactic sugar. After all, the functions are quite similar.

... though, we have been discussing parser aspects.
You plan to use a hand-written one; my advice is to stay away from that
route because maintaining a hand-written parser with an
occasionally-changing syntax means that all the clever shortcuts you
took will some day stop working.
Factoring out is one such clever shortcut, applied systematically.

I'm not saying that you will fail.
I'm just saying that you're running a considerable risk of failure here.
I'm also saying that the more languages SymPy supports using
hand-written parsers, the higher the maintenance overhead will become.

Secondly, I am not planning to use a hand written one. Right now sympy uses hand written parsers which has made the development process of new parsers a mess because of lack of structure and lack of standardization. All I want to achieve is a standard by which we will always be able to define new parsers should the need arise.

Also, I intend to pursue development of NLP parser for sympy(quite rudimentary) so as to achieve a basic capability to match that of WolphramAlpha's. I suggest you read the thread (the part where certik points to the same topic) and give your 2 cents about my approach.

Thanks,

Aditya

Joachim Durchholz

unread,

Mar 6, 2014, 3:31:47 PM3/6/14

to sy...@googlegroups.com

Am 06.03.2014 20:47, schrieb Aditya Shah:
> @Jo Well I am still unconvinced of your opinion that such the strategy that
> I intend to adopt will fail in the long run. I'll give you my reasons for
> it. Firstly, we are just talking about Math Spec Languages not generalized
> programming languages with complex rules. I have noticed that the rules of
> different Math Spec Languages tend to be quite similar differing only in
> the syntactic sugar. After all, the functions are quite similar.

Sure... it might work.

Essentially, it comes down to the question how hairy the differences
really are (or become). It's not something that can be determined
beforehand because the devil is in the details; you need to try it out
and see what details are unearthed.

I know that such differences are often massively underestimated.
I do not know whether this is the case here or not.

> Secondly, I am not planning to use a hand written one.

Ah ok, I wasn't sure about that.

> Right now sympy uses
> hand written parsers which has made the development process of new parsers
> a mess because of lack of structure and lack of standardization. All I want
> to achieve is a standard by which we will always be able to define new
> parsers should the need arise.

Okay, that's a good plan.

> Also, I intend to pursue development of NLP parser for sympy(quite
> rudimentary) so as to achieve a basic capability to match that of
> WolphramAlpha's. I suggest you read the thread (the part where certik
> points to the same topic) and give your 2 cents about my approach.

Sigh.
You should not have to develop an NLP parser (Earley/Tomita/GLR/NLP
parsing is essentially all the same).
Just use one of the existing libraries, don't reinvent the wheel (with a
high risk of doing it badly).

Try http://pythonhosted.org/modgrammar/ .
Or http://pages.cpsc.ucalgary.ca/~aycock/spark/ .

Systematic rules (folding letter case, renaming etc.) can be handled in
the lexer or during output generation. I don't know what's the better
approach; I'd try the output side first because lexers tend to know too
little about the context to always make the right decision.

Aditya Shah

unread,

Mar 7, 2014, 6:47:00 AM3/7/14

to sy...@googlegroups.com

@Jo Thanks a lot! The last post cleared all things up. So basically, I can enforce a standardized grammar and implement that used the likes of modgrammar library which I think is quite convenient and suitable for the task. I need to define the rules of the grammar and in such a way that any language can then be extended very simply via the development of a simple interface that converts the rules of that language to the standard set. Is that feasible now?

Thanks,

Aditya

Joachim Durchholz

unread,

Mar 7, 2014, 7:55:25 AM3/7/14

to sy...@googlegroups.com

Assuming that modgrammar allows what you want to do, I guess yes.

Given that modgrammar is built to accept grammar rules via Python code,
I guess you'd build the grammars for the various dialects (Mathematica,
MathML etc.) from function calls, and factor out what can be factored
out across grammars.

The main point would be that you consider cases like
a) What if one dialect adds a new operator with a priority between, say,
+ and *?
b) What if one dialect adds a new feature, e.g. an ellipsis token
('...') to indicate a range?
c) What if one dialect adds something whacky, such as (a...b( to
indicate a range from A (included) to B (excluded)? Such a grammar would
doubtlessly be highly ambiguous, but anyway.

I wrote "consider" - i.e. I do not think these should be implemented,
but your work will stand the test of time better if you imagine somebody
asking you about how to implement such a change and you don't want to
say "sorry, you need to redo it all for that".

A GLR parser should allow arbitrary grammar rules, even ambiguous one,
so I do believe that modgrammar (being a GLR parser) should fit the
bill. It might still fail due to implementation limitations or some
other devil in the detail, so we won't be sure about that before
somebody tries.

HTH
Jo

Aditya Shah

unread,

Mar 7, 2014, 8:23:33 AM3/7/14

to sy...@googlegroups.com

@Jo Thanks again for the clarifications. I did some research and I observed that conversion from Math Spec Languages to Sympy equivalent can be done via the use of RE themselves. So this will allow for a very efficient grammar(RE are manifestations of FSAs and the their conversions to CFGs is a quite simple algorithm). So in the end what matters is the underlying architecture by which the developer enter the details for the new parser to be made, which is then processed by the tool that I propose to build(the interface between a spec file and its conversion to python code so as to be processed by modgrammar) and then the resultant output is fed to modgrammar which then generates the final parser.

Regards,

Aditya

Joachim Durchholz

unread,

Mar 7, 2014, 8:36:08 AM3/7/14

to sy...@googlegroups.com

Am 07.03.2014 14:23, schrieb Aditya Shah:
> @Jo Thanks again for the clarifications. I did some research and I observed
> that conversion from Math Spec Languages to Sympy equivalent can be done
> via the use of RE themselves.

I doubt that that's possible for all cases.
This is one of the typical situation where the task looks simple at the
surface, then you start ironing out the kinks, and then you run into a
severe case of diminishing returns.

The nasty thing is: You won't notice that you're into this kind of
trouble until you hit that nasty corner case where an RE doesn't
suffice, but at that point, you have sunk so much time and energy into
the project that you don't want to go back and redo it all. (Or you
can't for lack of time.)
Heck, a syntax change in any of the input languages could trigger that
problem, so no analysis of existing languages will help you to determine
whether REs are enough or not.

> So this will allow for a very efficient
> grammar(RE are manifestations of FSAs and the their conversions to CFGs is
> a quite simple algorithm).

I know - though I know from trying my own hand in that area is that the
algorithm's concept is simple, but the implementation is hard to debug.
It's all too easy to mistake a symptom for the cause of a bug, so the
fix goes to the symptom instead of the bug.

> So in the end what matters is the underlying
> architecture by which the developer enter the details for the new parser to
> be made, which is then processed by the tool that I propose to build(the
> interface between a spec file and its conversion to python code so as to be
> processed by modgrammar)

Ideally, no spec file is needed.
Python code can serve as a spec file easily enough.

After all, it's not that you'll want users to provide their own spec
files - and even if you do, the level of expertise they'll need is at
roughly the same level as knowing Python, so you don't really make it
easier by creating a second tier here.

> and then the resultant output is fed to modgrammar
> which then generates the final parser.

Sounds needlessly complicated to me.

But maybe you have a reason why Python code won't do as a spec file?

Aditya Shah

unread,

Mar 7, 2014, 12:35:00 PM3/7/14

to sy...@googlegroups.com

@Jo, My philosophy is as much automation as possible. Now just for the sake of argument consider this. We have 2 Math Spec languages A and B. Since both are means to the same end, they differ only in subtle places. While that might not be the case always, let us assume that it is so. By using python code as the spec file, we are forcing the user to unnecessarily iterate over the rules that are same for both the languages. So this is what I propose, we keep a spec sheet in between which captures only the differing rules. The rest of them will be assumed to be the same being specified once and open to inspection. That way we can build on existing rules and not waste programming effort as well as time on reinventing the wheel as you put it. This resembles the functional overriding concept from OOP. I hope this addresses your concerns.

Regards,

Aditya

Aditya Shah

unread,

Mar 9, 2014, 12:08:02 AM3/9/14

to sy...@googlegroups.com

@asmeurer @skirpichev @certik @jo I have drafted a proposal for my project. You can find it at https://github.com/sympy/sympy/wiki/GSoC-2014-Application-Aditya-Shah-SymPy-Parsing-Framework. Please review it and leave your suggestions below.

Thanks,

Aditya

PS- I have yet to add the timeline to my project.

Sachin Joglekar

unread,

Mar 9, 2014, 3:28:10 AM3/9/14

to sy...@googlegroups.com

Nice proposal. You may want to add a section showing a rough mock-prototype (API) of your work. This will make it easier for others to understand what you are aiming at. Also, you seem to have made two identical wiki pages for your proposal. Delete one of them.

On Sunday, February 16, 2014 10:13:05 PM UTC+5:30, Aditya Shah wrote:

Hi,
I am Aditya Shah and I am a third year Computer SCience student at BITS-Pilani university. I would like to work with Sympy for GSOC. I had previously posted on this mailing list regarding my willingness to implement the group theory module for Sympy. While scrolling through the ideas list, I came upon the idea to improve the parser for Sympy Live. I have a small background in parsing and natural language processing, since I have done projects on those topics for my college course work. Can anyone please tell me how much work is done on parsers, and what needs to be implemented further?

@ProspectiveMentor: Please reply to this post so that I can discuss further regarding the topic.

Github profile: https://github.com/adityashah30/
IRC: adityashah30

Thanks,
Aditya Shah

Aditya Shah

unread,

Mar 9, 2014, 4:18:09 AM3/9/14

to sy...@googlegroups.com

@Sachin Thanks. I have deleted the identical page. I will soon update the page to include the sections of Mock API and Timeline.

Regards,

Aditya

Aditya Shah

unread,

Mar 9, 2014, 1:04:27 PM3/9/14

to sy...@googlegroups.com

@asmeurer @skirpichev @certik @jo @srjoglekar246 I have completed my GSoc proposal. You can find it at https://github.com/sympy/sympy/wiki/GSoC-2014-Application-Aditya-Shah-SymPy-Parsing-Framework. Please review it and leave your suggestions below.

Thanks,

Aditya Shah

Aaron Meurer

unread,

Mar 9, 2014, 6:34:18 PM3/9/14

to sy...@googlegroups.com

I agree about using a Python file. I think it's just question of using
Python syntax to represent the grammar.

Aaron Meurer

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/579f72f6-1a87-48dc-a996-91b3842db589%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Aditya Shah

unread,

Mar 9, 2014, 9:28:38 PM3/9/14

to sy...@googlegroups.com

@asmeurer, I respectfully disagree. Using a python file to represent the grammar forces the user to understand and know the format of the file. Also, should the current Parser generator Framework needs be replaced in future (discontinued or any similar reason), it would cause much inconvenience, because then the users would have to learn about the new format by which the grammar is specified. Such an arrangement (the one that you are suggesting), breaks the API in such an event. Plus, if we specify the grammar as I have mentioned in the proposal, it allows the user to directly use EBNF (Extended Backus-Naur Form) which is a standard way to represent CFGs (Context Free Grammars). I hope this justifies my using such an arrangement.

Regards,

Aditya

Aaron Meurer

unread,

Mar 9, 2014, 10:14:57 PM3/9/14

to sy...@googlegroups.com

With what you are suggesting they also have to know the format of the file.

Probably we should allow EBNF, but should allow users to create their
own grammars directly using the Python objects that the EBNF would be
internally converted to.

Aaron Meurer

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sympy+un...@googlegroups.com.
> To post to this group, send email to sy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/sympy/682827be-7367-4f6a-a7f7-9e65d6dbb491%40googlegroups.com.

Aditya Shah

unread,

Mar 9, 2014, 10:28:16 PM3/9/14

to sy...@googlegroups.com

Well I think that is not a problem at all. In case the user wants to directly skip to the part of generating the grammar as a python code, he/she can do so. Write the correct grammar file and and jump to step 3 i.e. use the PGF to generate the parser. The rest of the things remain same. So I think that in order to accommodate your perspective, there is no change required at all.

Regards,

Aditya

Joachim Durchholz

unread,

Mar 10, 2014, 2:55:45 AM3/10/14

to sy...@googlegroups.com

Am 10.03.2014 02:28, schrieb Aditya Shah:
> @asmeurer, I respectfully disagree. Using a python file to represent the
> grammar forces the user to understand and know the format of the file.

Yes, but with a Python file he knows the format already.

> Also, should the current Parser generator Framework needs be replaced in
> future (discontinued or any similar reason),

Using EBNF as input doesn't eliminate modgrammar, so we're hosed anyway
if modgrammar goes away.

You're right in that using Python means we're tied to modgrammar and
can't easily replace it if we grow discontent with it.
However, I think it's worth a shot. Software isn't just engineering,
it's sometimes experimenting.

> Plus, if we specify
> the grammar as I have mentioned in the proposal, it allows the user to
> directly use EBNF (Extended Backus-Naur Form) which is a standard way to
> represent CFGs (Context Free Grammars).

I have yet to see an EBNF that didn't use extensions of one or the other
form, so these would have to be learned.
Also, the other important part is the transformation rules, i.e. how to
translate the input to a common data structure. The syntax for these
transformation rules is not standardized.

IOW the burden on the user is marginal.
Also, learning Python APIs isn't that hard if they are designed well. I
suspect that you're pushing for an EBNF because you're more used to that
from your background, and we're pushing towards Python because we're
used to that; I bet SymPy users will prefer Python, too.

> I hope this justifies my using such
> an arrangement.

Using an EBNF is adding two(!) additional layers of software for a
marginal advantage.
In my eyes, that's most likely a net loss.

Let me suggest you start without the EBNF part. If modgrammar turns out
to be too nasty, the EBNF parser + modgrammar generator layers can still
be added later; at that time, you'll also have collected experience with
what does and what does not work with modgrammar, so the generators will
be easier to write and more reliable in operation. Also, there's the
option of replacing modgrammar with spark, which is the other contender
in the GLR arena.

Aditya Shah

unread,

Mar 10, 2014, 4:33:32 AM3/10/14

to sy...@googlegroups.com

@Jo thanks for chiming in. I understand your sentiment but from your reply, I gather that you have properly understood my proposal. Firstly, adding EBNF adds 1 extra layer which can anyways be overridden by an experienced programmer such has yourself. Secondly, I propose to keep the extra layer to maintain the API in case we grow discontent with modgrammar. In my scheme of things, you write the grammar in EBNF form, then the Spec to Grammar converter generates a Python file (which BTW anyone can write themselves if they know the format that nodgrammar takes the input in) which serves as an input to modgrammar. Should we decide to replace modgrammar with say Spark, we can change to code of Spec to Grammar converter file to generate the corresponding input to Spark. Here, adding one extra layer offers a big advantage. Even if the underlying framework is changed, that detail has been abstracted and so user needn't worry about that. And yes, if the user knows the format of the Python file which serves as an input to either of the framework, he/she can bypass the first stage and directly write the input file (generally in Python). Adding an extra layer doesn't hamper the productivity in any case, rather it enhances the same in the case that user doesn't know or doesn't want to bother with the details of how input file to the PGF is generated.

I hope these arguments satiate your questions.

Regards,

Aditya Shah

Joachim Durchholz

unread,

Mar 10, 2014, 6:53:11 AM3/10/14

to sy...@googlegroups.com

Am 10.03.2014 09:33, schrieb Aditya Shah:
> @Jo thanks for chiming in. I understand your sentiment but from your reply,
> I gather that you have properly understood my proposal. Firstly, adding
> EBNF adds 1 extra layer which can anyways be overridden by an experienced
> programmer such has yourself. Secondly, I propose to keep the extra layer
> to maintain the API in case we grow discontent with modgrammar. In my
> scheme of things, you write the grammar in EBNF form, then the Spec to
> Grammar converter generates a Python file (which BTW anyone can write
> themselves if they know the format that nodgrammar takes the input in)
> which serves as an input to modgrammar. Should we decide to replace
> modgrammar with say Spark, we can change to code of Spec to Grammar
> converter file to generate the corresponding input to Spark. Here, adding
> one extra layer offers a big advantage.

Not sure that that will really be the case. Usually, using one tool
(one's own or that of somebody else) makes you dependent on that tool,
simply because you set up things the way that that tool needs, simply
because you don't know (yet) what the other tool will need.

So... I do not think you'll reap the benefits you assume.
(Premature factoring-out, the twin of premature optimization, if you will.)

> Even if the underlying framework is
> changed, that detail has been abstracted and so user needn't worry about
> that.

I do not think that a standard SymPy user will ever try to build another
XYZ-to-SymPy converter.

IOW you have a solution, but is it solving a problem?

> And yes, if the user knows the format of the Python file which serves
> as an input to either of the framework, he/she can bypass the first stage
> and directly write the input file (generally in Python). Adding an extra
> layer doesn't hamper the productivity in any case,

Oh yes it does.
Not user productivity, but maintainer productivity. Both the maintainers
of the grammar files (who now need to know your specific EBNF dialect in
addition to modgrammar), and the maintainers of the import subsystems.

That effort would be justified if it added valuable functionality, but I
don't see that.

> rather it enhances the
> same in the case that user doesn't know or doesn't want to bother with the
> details of how input file to the PGF is generated.

Those users will simply feed their Mathematica code to SymPy and
complain if it doesn't work.
They're primarily interested in getting results, not in dealing with
input languages.

Your approach does have merits for those who, like you, come from an NLP
background and want to work at the EBNF level.
Everybody else either has no advantage (end users), or doesn't care
(maintainers who want to update the syntax files - they'd have to learn
either the EBNF dialect or the modgrammar API, that's essentially the
same), or have a disadvantage (import subsystem maintainers).

Aditya Shah

unread,

Mar 10, 2014, 7:19:29 AM3/10/14

to sy...@googlegroups.com

@Jo while I think the last of your concerns is valid, I would say that EBNF is very popular in the field of theory of computation and programming language principles. According to me, you would find more people who are well versed in EBNF rather than knowing or wanting to learn modgrammar API. Plus, since here I attempt to bring in standardization, we need to set some structure, which means a dependence on some tool or the other (also, if you remember, it was you who suggested to use modgrammar or some exsting PGF to me when I was foolish enough to set out to write my own. Thanks for that) But yes, coming to the point, I don't see any other way here. Also, as you say a developer needs to learn either EBNF or modgrammar API, the liberty for which I have already provided in the existing architecture. The developer can either opt for EBNF or direct Python file as an input. I am sorry but I don't understand the rest of your concerns.

Regards,

Aditya

Joachim Durchholz

unread,

Mar 10, 2014, 7:29:22 AM3/10/14

to sy...@googlegroups.com

Am 10.03.2014 12:19, schrieb Aditya Shah:
> @Jo while I think the last of your concerns is valid, I would say that EBNF
> is very popular in the field of theory of computation and programming
> language principles. According to me, you would find more people who are
> well versed in EBNF rather than knowing or wanting to learn modgrammar API.

Heh. That's a self-selected group of users, and that's not a useful sample.

Besides, the relevant group isn't the people that hang out around you,
it's the users and maintainers of SymPy. These all know Python, and they
know how to read and apply API docs; most of them don't know how to read
and apply EBNF, plus the inevitable additions.
Heck, I myself have roughly ten times the experience with EBNF than with
Python, and still I'd stick with Python. Simply because it's going to be
a smoother learning curve for the maintainers.

> Plus, since here I attempt to bring in standardization, we need to set some
> structure, which means a dependence on some tool or the other (also, if you
> remember, it was you who suggested to use modgrammar or some exsting PGF to
> me when I was foolish enough to set out to write my own. Thanks for that)

Yes. We don't really like external dependencies, but modgrammar or
equivalent seems to be part of any solution.
Writing our own GLR parser is definitely beyond the scope of SymPy's
mission statement, or experience.

> But yes, coming to the point, I don't see any other way here. Also, as you
> say a developer needs to learn either EBNF or modgrammar API, the liberty
> for which I have already provided in the existing architecture. The
> developer can either opt for EBNF or direct Python file as an input. I am
> sorry but I don't understand the rest of your concerns.

Ask and I'll answer.

Aditya Shah

unread,

Mar 10, 2014, 7:41:35 AM3/10/14

to sy...@googlegroups.com

@Jo thanks for the swift reply. I'll try to explain my position to you. Since, I come from a bit theoretical background, I do prefer to use EBNF to manipulate grammars. As you said, that is a selected group of people and I believe that. Also, on the suggestion of @asmeurer, if you read 6th last post, you'd find that people wanting to work with Python can do so directly without bothering with explicit notation. Here I ask you one thing. I have seen the documentation of modgrammar and even Spark and while both of them use Python for directly represent the grammar, you'd find that without prior knowledge of EBNF, you cannot write the python code. It is because, they encapsulate the EBNF form by Python code and so I believe that EBNF should be known a priori. Plus, as you said no user is ever going to bother about XYZ-Sympy converters. That would rest on developers who set out to define those parsers. I do think that people who intend to create parsers, would have prior knowledge of EBNF. Even if they don't as you claim, they can always bypass that step entirely. As for the maintenance part, if modgrammar or any other PGF decides to discontinue or change their API, we would have to only change the Spec to Language converter file or as you claim learn the new API. What I don't seem to understand is that, there are 2 approaches to do the given thing and even if one is not widely popular ( as you claim), if we do add support for that, it would only serve to empower someone who fits in that particular case to use it. So where's the issue?

Regards,

Aditya

Joachim Durchholz

unread,

Mar 10, 2014, 1:30:36 PM3/10/14

to sy...@googlegroups.com

Am 10.03.2014 12:41, schrieb Aditya Shah:
> @Jo thanks for the swift reply. I'll try to explain my position to you.
> Since, I come from a bit theoretical background, I do prefer to use EBNF to
> manipulate grammars. As you said, that is a selected group of people and I
> believe that. Also, on the suggestion of @asmeurer, if you read 6th last
> post, you'd find that people wanting to work with Python can do so directly
> without bothering with explicit notation.

Sure, but adding another layer on top of that just because you can (and
want to) doesn't necessarily improve things. Actually the price is a
higher rate of bugs (more code means more bugs), and a more complicated
system.

> Here I ask you one thing. I have
> seen the documentation of modgrammar and even Spark and while both of them
> use Python for directly represent the grammar, you'd find that without
> prior knowledge of EBNF, you cannot write the python code. It is because,
> they encapsulate the EBNF form by Python code and so I believe that EBNF
> should be known a priori.

Well... structurally it's always the same, but at that level, preferred
syntax doesn't matter.

> Plus, as you said no user is ever going to bother
> about XYZ-Sympy converters. That would rest on developers who set out to
> define those parsers.

Exactly.

> I do think that people who intend to create parsers,
> would have prior knowledge of EBNF. Even if they don't as you claim, they
> can always bypass that step entirely.

Those who maintain an existing parser can't, they'll have to work with
what's there.
Maintenance takes far more time overall than initial creation, so that's
a real concern.

> As for the maintenance part, if
> modgrammar or any other PGF decides to discontinue or change their API, we
> would have to only change the Spec to Language converter file or as you
> claim learn the new API.

You depend on modgrammar anyway.
Besides, incompatible API changes are the exception, not the rule.
Plus, we're not necessarily forced to upgrade to an incompatible version
of modgrammar.

> What I don't seem to understand is that, there are
> 2 approaches to do the given thing and even if one is not widely popular (
> as you claim), if we do add support for that, it would only serve to
> empower someone who fits in that particular case to use it. So where's the
> issue?

As I said: Maintenance overhead.
More code means more maintenance overhead.
And that's a really big issue, because (as I said) over time,
maintenance is more work than initial implementation.

Aditya Shah

unread,

Mar 10, 2014, 2:16:36 PM3/10/14

to sy...@googlegroups.com

@Jo I think your concern regarding maintenance overhead is true, but then it is true for all the software systems. I still maintain that adding the "extra" layer does add some serious benefit. And as for the maintenance, I think that some serious problems would arise when there is an issue of incompatible API (which you said is an exception not a rule). Plus as you pointed out, we can always use a preferred version of modgrammar. So there you have it. If we continue to do so, I don'y think there is any maintenance overhead at all (since we don't change anything). Plus, as for the bug part, I take full responsibility to make the code as bug free as possible.

Regards,

Aditya

Aditya Shah

unread,

Mar 10, 2014, 2:26:45 PM3/10/14

to sy...@googlegroups.com

@Jo I would also add that my claim that the "Extra" layer adds some benefit is in the fact (which I already explained to you several posts back) that most of the languages tend to share large portions of their grammar. So we can simply merge the generic rules with the language specific rules to enforce modularity. I suggest you please read my GSoC proposal in its entirety. You can find it at https://github.com/sympy/sympy/wiki/GSoC-2014-Application-Aditya-Shah-SymPy-Parsing-Framework.

Regards,

Aditya

Joachim Durchholz

unread,

Mar 10, 2014, 3:03:46 PM3/10/14

to sy...@googlegroups.com

Am 10.03.2014 19:26, schrieb Aditya Shah:
> @Jo I would also add that my claim that the "Extra" layer adds some benefit
> is in the fact (which I already explained to you several posts back) that
> most of the languages tend to share large portions of their grammar.

If modgrammar isn't entirely braindamaged, it should allow grammar
snippet sharing. Probably better than EBNF snippet sharing.

> So we
> can simply merge the generic rules with the language specific rules to
> enforce modularity. I suggest you please read my GSoC proposal in its
> entirety.

I did.

Aditya Shah

unread,

Mar 10, 2014, 3:10:47 PM3/10/14

to sy...@googlegroups.com

@Jo Okay let us assume that modgrammar allows us to share grammar. In that case I have no problem to remove the EBNF dependency, although I would say that it is a good thing to have around. Plus i haven't had the time to read the entire documentation of modgrammar (too much coursework!). I intend to do that in community bonding period. So till then let us keep the architecture this way or else i'll modify. Seems fair?

Regards,

Aditya

Joachim Durchholz

unread,

Mar 10, 2014, 3:32:25 PM3/10/14

to sy...@googlegroups.com

Am 10.03.2014 20:10, schrieb Aditya Shah:
> @Jo Okay let us assume that modgrammar allows us to share grammar.

Okay.
Yeah. That's the implicit assumption I've been making; sorry for not
making that clear.

If it doesn't, the game changes.

> In that
> case I have no problem to remove the EBNF dependency, although I would say
> that it is a good thing to have around.

Well, let's agree to disagree on that one - I definitely see a price here.
Maybe it's because I have done more maintenance work in my life. Given
that SymPy, like many Open-Source projects, has limited manpower, I tend
to weigh maintenance overhead very highly; we don't want to get bogged
down work on mechanism, even if the mechanism is small - all those small
mechanisms add up and in the end the whole system is still clearly
structured but too large to make any significant progress given the
available manpower.

Actually I see all those foreign-syntax import things as a detraction
from SymPy's core mission already: It's not making the expression
transformation code faster or more powerful, it's not improving on error
messages, it's not making the transformations more predictable or
consisetent.
Not that I'm saying this shouldn't be done - it allows people to easily
migrate towards SymPy, so it's important to build the user base. Still,
it's a detraction. PR if you will. That kind of stuff is important, but
it should require as little overhead as possible.

> Plus i haven't had the time to read
> the entire documentation of modgrammar (too much coursework!).

Heh. I haven't either, I have to admit.
It's probably a good idea to postpone further discussion until at least
one of us has a better grasp of what modgrammar can and cannot do.

> I intend to
> do that in community bonding period. So till then let us keep the
> architecture this way or else i'll modify. Seems fair?

/shrug I'm just offering advice on a take-it-or-leave-it basis (I'm not
mentoring due to lack of time), so you're free to do as you wish anyway.

My suggestion would be to start with getting a proof-of-concept done
with modgrammar (no EBNF), and see how well that works.
If modgrammar turns out to be useless, I'd suggest trying Spark.
If that doesn't work either, then we probably really need to generate
modgrammar or Spark code.
However, code generation, while attractive in theory, comes with many
strings attached in practice. It complicates the build process, and the
generated code tends to trigger code paths that manually-written code
does not, and that means you are more likely to hit that one absurd
showstopper bug than those who simply code in Python. Also, code
generation is harder to debug because you need to debug at two levels at
the same time: Code generator and generated code. It's more likely that
you'll have bugs that way.

Aditya Shah

unread,

Mar 10, 2014, 3:38:04 PM3/10/14

to sy...@googlegroups.com

@Jo thank you for your valuable insight. I will keep those points in mind and I will definitely try out modgrammar first.

Regards,

Aditya Shah

unread,

Mar 19, 2014, 1:12:37 AM3/19/14

to sy...@googlegroups.com

@asmeurer @certik I have posted my final proposal on melange. Could you please take a look at it and provide suggestions

Thanks,
Aditya

Aditya Shah

unread,

Apr 21, 2014, 3:29:18 PM4/21/14

to sy...@googlegroups.com

@asmeurer @certik @skirpichev Thank you for introducing me to the world of open source. I just have one question in mind. Could you please provide feedback on my GSOC proposal so that I can make sure that all the problems are patched by next time?