New Python-compliant parser... and why it won't be used at the moment

174 views
Skip to first unread message

Pierre Quentel

unread,
May 22, 2022, 4:18:05 PM5/22/22
to brython
After a Python-compliant tokenizer, the production of a Python-compliant AST and the generation of Javascript code from this AST, the latest development version includes the last piece of the puzzle : a Python-compliant parser which generates the AST from source code.

- the Python script make_full_grammar.py generates /src/full_grammar.js, a Javascript version of the standard Python grammar (Python.gram)
- source code of Python scripts is processed by class Parser in /src/python_parser.js. This script implements a PEG parser as specified in PEP 617. It is an adaptation in Javascript of the algorithm detailed in the academic paper mentioned in this PEP, Packrat Parsers Can Support Left Recursion
- the "grammar actions" (explained in PEP 617, adapted to Javascript) generate the AST ; they use code in scripts action_helpers.js, string_parser.js and number_parser.js

All these scripts are grouped in a new bundle, /src/brython_with_parser.js. If it is inserted in the page along with brython.js, this new parser replaces the custom parser in py2js.js that has been used since the first versions of Brython.

With the new parser, the AST is guaranteed to be the same as with CPython, and the error messages are the same. This is a clear improvement to the current situation, where syntax errors can be ignored, or not reported exactly as in CPython.

Unfortunately, there is a major drawback: speed... Applying the dozens of rules in the grammar, with backtracking and unlimited lookahead, takes a very long time, around 3 to 5 times more than the current parser. This is terribly disappointing, but unless there is a way to improve performance (and I tried all sorts of ways, unsuccessfully) I will have to stick to the current parser. I publish the Python-compliant version for those who might be interested by the algorithm.

Another issue with this approach would have been that the scripts adapted from CPython code (eg action_helpers.js adapted from action_helpers.c) are a "hand-made" translation where C syntax is replaced by more or less equivalent Javascript syntax. Future changes to the C code would have required to track the changes and manually reproduce them in the JS code, which is boring and error-prone. It would have been worthwile if speed was not such an issue, though.

Of course if some of you have the time and interest to dive into the parsing algorithm and find ways to improve performance it would be great !

Kiko

unread,
May 22, 2022, 4:34:18 PM5/22/22
to bry...@googlegroups.com
2022-05-22 22:18 GMT+02:00, Pierre Quentel <pierre....@gmail.com>:
> After a Python-compliant tokenizer, the production of a Python-compliant
> AST and the generation of Javascript code from this AST, the latest
> development version includes the last piece of the puzzle : a
> Python-compliant parser which generates the AST from source code.
>
> - the Python script *make_full_grammar.py* generates */src/full_grammar.js*,
>
> a Javascript version of the standard Python grammar (Python.gram
> <https://github.com/python/cpython/blob/main/Grammar/python.gram>)
> - source code of Python scripts is processed by class Parser in
> */src/python_parser.js*. This script implements a PEG parser as specified
> in PEP 617 <https://peps.python.org/pep-0617/>. It is an adaptation in
> Javascript of the algorithm detailed in the academic paper mentioned in
> this PEP, Packrat Parsers Can Support Left Recursion
> <http://web.cs.ucla.edu/~todd/research/pepm08.pdf>
> - the "grammar actions" (explained in PEP 617, adapted to Javascript)
> generate the AST ; they use code in scripts *action_helpers.js*,
> *string_parser.js* and *number_parser.js*
>
> All these scripts are grouped in a new bundle,
> */src/brython_with_parser.js*.
> If it is inserted in the page along with *brython.js*, this new parser
> replaces the custom parser in *py2js.js* that has been used since the first
>
> versions of Brython.
>
> With the new parser, the AST is guaranteed to be the same as with CPython,
> and the error messages are the same. This is a clear improvement to the
> current situation, where syntax errors can be ignored, or not reported
> exactly as in CPython.

Congrats, Pierre!!!!
This is a nice piece of work. Thanks for all the time and effort put into this.

>
> Unfortunately, there is a major drawback: speed... Applying the dozens of
> rules in the grammar, with backtracking and unlimited lookahead, takes a
> very long time, around 3 to 5 times more than the current parser. This is
> terribly disappointing, but unless there is a way to improve performance
> (and I tried all sorts of ways, unsuccessfully) I will have to stick to the
>
> current parser. I publish the Python-compliant version for those who might
> be interested by the algorithm.
>
> Another issue with this approach would have been that the scripts adapted
> from CPython code (eg *action_helpers.js* adapted from *action_helpers.c*)
> are a "hand-made" translation where C syntax is replaced by more or less
> equivalent Javascript syntax. Future changes to the C code would have
> required to track the changes and manually reproduce them in the JS code,
> which is boring and error-prone. It would have been worthwile if speed was
> not such an issue, though.
>
> Of course if some of you have the time and interest to dive into the
> parsing algorithm and find ways to improve performance it would be great !
>

I have a question. Are the new parser creating 'lighter' js? Can the
new parser be used to generate an AOT 'production' js version of the
python code?

> --
> You received this message because you are subscribed to the Google Groups
> "brython" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to brython+u...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/brython/ad98e0d1-d177-4f19-b992-0f7363a7d983n%40googlegroups.com.
>

Pierre Quentel

unread,
May 22, 2022, 4:50:31 PM5/22/22
to brython
Le dimanche 22 mai 2022 à 22:34:18 UTC+2, kiko (on pybonacci) a écrit :


I have a question. Are the new parser creating 'lighter' js? Can the
new parser be used to generate an AOT 'production' js version of the
python code?

No, there is no change in the JS code, which is generated from the Abstract Syntax Tree by the script ast_to_js.js. The parser only deals with the transformation of Python source code into the AST.

Edward Elliott

unread,
May 22, 2022, 5:34:50 PM5/22/22
to bry...@googlegroups.com
Thanks Pierre, that's great!  Very nice work.

It would definitely be nice to have better syntax error messages.  Perhaps a temporary approach could be this:

  1. Run the old parser py2js.js to parse a script
  2. If everything parses ok, proceed with execution
  3. If an error occurs during parsing, rerun the new parser python_parser.js on script to create a more accurate error message.  Then you only incur the new parsing penalty when your script won't run anyway.

Is parsing that costly an operation?  I haven't noticed an issue running 8k lines of brython code.  
From when I invoke brython () to when first line of my script executes (after imports) is only about 500 ms.  Not sure how much of that is parsing time, but guessing most brython scripts are significantly shorter than mine.


--
You received this message because you are subscribed to the Google Groups "brython" group.
To unsubscribe from this group and stop receiving emails from it, send an email to brython+u...@googlegroups.com.

Pierre Quentel

unread,
May 23, 2022, 3:41:25 PM5/23/22
to brython
Le dimanche 22 mai 2022 à 23:34:50 UTC+2, Edward Elliott a écrit :
Thanks Pierre, that's great!  Very nice work.

It would definitely be nice to have better syntax error messages.  Perhaps a temporary approach could be this:

  1. Run the old parser py2js.js to parse a script
  2. If everything parses ok, proceed with execution
  3. If an error occurs during parsing, rerun the new parser python_parser.js on script to create a more accurate error message.  Then you only incur the new parsing penalty when your script won't run anyway.

Is parsing that costly an operation?  I haven't noticed an issue running 8k lines of brython code.  
From when I invoke brython () to when first line of my script executes (after imports) is only about 500 ms.  Not sure how much of that is parsing time, but guessing most brython scripts are significantly shorter than mine.


Thanks for the feedback Edward.

It would be possible in theory, but:
- syntax errors not detected by the current parser would remain unnoticed
- there would be 2 parsers to maintain and change at each new Python version - maintaining one is already a lot of work
- the weight of the new parser (brython_standard_parser.js) is currently 927 kB. There is room for improvement, but I don't think it could be much less than 700 kB, that is, around the same size as brython.js which includes the parser *and* the implementation of all builtins

My conclusion is that only one parser should be used.

The parsing time is obviously an order of magnitude less than execution time, but the built-in test suite is almost twice slower with the new parser. This confirms the individual tests I made on big modules such as _pydecimal.py (230 kB). Compared to other Python-in-the-browser solutions, execution speed is probably one of Brython strengths, and I have the feeling that weakening it for an almost 100% compliance with Python is a price too high to pay.
 

g40

unread,
Nov 13, 2022, 11:36:09 AM11/13/22
to brython
Hello Pierre,

This is a shockingly late reply but I have some time to look at this approach. The main problem is I cannot find the files you reference. Are they on Github and if so in which branch? 

Thanks

Jerry

Pierre Quentel

unread,
Nov 17, 2022, 3:52:10 PM11/17/22
to brython
Le dimanche 13 novembre 2022 à 17:36:09 UTC+1, g40 a écrit :
Hello Pierre,

This is a shockingly late reply but I have some time to look at this approach. The main problem is I cannot find the files you reference. Are they on Github and if so in which branch? 

Thanks

Jerry

Your message gave me the opportunity to test the PEG parser after a few months, and as I expected, it was broken... I have fixed it and you can now use it by replacing brython.js by brython_standard_parser.js, available in the Github repo.

brython_standard_parser.js is made of brython.js, plus the following scripts, also part of the repo:
- string_parser.js and number_parser.js (parse string and number literals)
- action_helpers.js (functions used to generate the Abstract Syntax Tree)
- python_parser.js (implementation of the PEG parser)
- full_grammar.js (adaptation of CPython's python.gram as a JS structure with rules and grammar actions)

The performance is still much worse than with the current "hand-made" parser, at least more than 2 times slower, so it's not going to be used as a replacement until someone finds a way to make it faster.

J M E

unread,
Nov 19, 2022, 12:50:30 PM11/19/22
to bry...@googlegroups.com
Thanks Pierre. PEGs are expensive to operate. 
--
You received this message because you are subscribed to the Google Groups "brython" group.
To unsubscribe from this group and stop receiving emails from it, send an email to brython+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brython/4efcf63d-5f92-4c45-af70-f8d736e22876n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages