It would be great to have improvements to the LaTeX parser. Let us
know if you have any issues opening a pull request.
The test_latex.py file is correct. test_sympy_parser.py has tests for
the Python parser, which isn't related to the LaTeX parser as far as I
know.
Aaron Meurer
On Sun, May 24, 2020 at 9:33 AM Ben <ben.is...@gmail.com> wrote:
>
> To answer my own question, I think I found the tests:
> https://github.com/sympy/sympy/blob/master/sympy/parsing/tests/test_latex.py
> https://github.com/sympy/sympy/blob/master/sympy/parsing/tests/test_sympy_parser.py
>
>
> On Sunday, May 24, 2020 at 3:01:09 AM UTC-4, Ben wrote:
>>
>> Hello,
>>
>> I'm using Sympy to parse mathematical expressions written in Latex. I have observed that parsing Latex does not always work, so I've been collaborating with a friend to modify the ANTLR grammar file to address some of the issues we have encountered. The repo with the changed files (as well as a Dockerfile to configure the environment and build Sympy with the modified grammar) is https://github.com/allofphysicsgraph/sympy-grammar-modifications
>>
>> I'm interested in contributing the modified grammar file to Sympy and I have not contributed to Sympy before. I've read the Sympy workflow documentation.
>> My background: I've been using Python for about 15 years and am comfortable with git and branching.
>> Prior to making the pull request, I have a question.
>>
>> I don't see where the current grammar file for parsing Latex is tested. Looking at the script https://github.com/sympy/sympy/blob/master/bin/test doesn't lead me to insights. Also, I don't see any tests defined in the directory https://github.com/sympy/sympy/tree/master/sympy/parsing/latex
>>
>> I want to eventually make a pull request regarding the Latex parsing grammar, but I don't know where to create tests that would validate the changes. I'd like to be able to demonstrate that the changes are not breaking Sympy.
>>
>> Kindly,
>>
>> Ben
>>
>>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sy...@googlegroups.com.
In the process of working on handling spaces in latex, I had two realizations. First, spaces in Latex math could mean "multiply two variables" or it could just be a way of managing layout of the expression. (I posted examples in https://github.com/sympy/sympy/issues/19075).
My second realization was that it might be easier to remove the aspects of the Latex string that are related to presentation. Specifically in this case, replace a Latex string's "\ " with " " and replace "\," with " " before passing the string to Sympy.
I don't want to discourage you in any way, and I may be naive, but I'd have thought LaTex would always be ambiguous one way or another - particularly if it is hand written. I'd have thought the best solution in the long term would be if people wrote their equations in SymPy and then generated LaTex with the latex() function.
David
Hi Ben,
I don't want to discourage you in any way, and I may be naive, but I'd have thought LaTex would always be ambiguous one way or another - particularly if it is hand written. I'd have thought the best solution in the long term would be if people wrote their equations in SymPy and then generated LaTex with the latex() function.
David
You're totally correct -- Latex is ambiguous. I don't find your observation discouraging since it is perfectly reasonable.
The issue I'm interested in tackling is the conversion of math presented in Physics papers (e.g., .tex files on arxiv.org) to a semantically meaningful and unambiguous representation (e.g., Sympy).
This issue would be moot if Physics papers were written in Sympy. I don't have insight on how to construct incentives that would lead to use of Sympy in Physics papers, so I'm working on the Latex-to-Sympy approach.
Right - well in that case, maybe a system of hints that the user could add to your parser, would be really useful. For example if a user could tell your parser that superscripts were usually tensor subscripts rather than exponents (or alternatively that certain symbols used as superscripts would never mean exponents) you could come out with a better translation. Another useful hint, might be a list of the multi-letter symbols in use - sin, cos, exp, ln etc. so that you could resolve your ambiguity of what ab means - I mean sometimes sin(x) might mean s*i*n(x) and that could be handled by user specifying that only certain multi-letter symbols were in use.
David