Improving the javascript importer

24 views
Skip to first unread message

Edward K. Ream

unread,
Feb 24, 2018, 10:01:08 AM2/24/18
to leo-editor
Improving how the javascript importer assigns lines to nodes has a high priority.  Javascript is arguably the most important computer language in the world.

We both want the js importer to do as well as possible, especially with all common idioms, whatever they are ;-)

Suppose Vitalije devise a truly excellent js parser that knows (somehow!) how to split a js file into lines. Imo, we can prove the following theorem:

    js_i.gen_lines and its helpers can implement any stand-alone parse of javascript.

At present, the guts of the js importer is just js_i.starts_block, a helper of the i.gen_lines in the base Importer class in importers/linescanner.py. The theorem implies that the js importer might have to override i.gen_lines, creating js_i.gen_lines and who knows what helpers.

The theorem is important because the base Importer class provides a lot of services. In particular, it handles tokenizing javascript. Tokenizing is fraught because recognizing regex expressions depends on context. A prototype js parser could ignore such things,

Summary

Here is my proposed strategy:

1. Verify that a new parser does substantially better than the present code.  I trust that Vitalije can do this.

2. Do whatever is necessary to recast the algorithm in terms of the existing Importer organization. I'll know what this entails only when I see the new parser algorithm and its code.

Edward

P. S. A strict parser never split incoming lines into two (or more?) lines. A lenient parser may split incoming lines.  A strict parser will "just work" with Leo's existing Importer base class.  A lenient parser would probably require tweaks to the prefect import tests, but this is not a deal breaker.

Vitalije, are you thinking of a strict or lenient parser?

EKR

Terry Brown

unread,
Feb 24, 2018, 10:44:17 AM2/24/18
to leo-e...@googlegroups.com
On Sat, 24 Feb 2018 07:01:08 -0800 (PST)
"Edward K. Ream" <edre...@gmail.com> wrote:

> Suppose Vitalije devise a truly excellent js parser that knows
> (somehow!) how to split a js file into lines.

I find that particularly with js code with small functions using @clean
I often want multiple functions in one node, because they're related to
each other. So I wonder if a combination of reasonable top level
import with helpful tools to manually manage splitting into @clean is
the best of both worlds.

Even for Python, I prefer to see files of less than 100 lines in a
single node. The @auto / @edit toggle command makes this easy now...
sort of an example of the manual helper command idea.

Cheers -Terry

vitalije

unread,
Feb 24, 2018, 10:49:27 AM2/24/18
to leo-editor

Vitalije, are you thinking of a strict or lenient parser?

EKR
I must say that I thought that the writing full lexer/parser for the jungle of current javascript dialects was out of question. It seemed to me way too much work for just a single language. That is why I did abandon this approach and proposed some kind of combination of human intelligence and dummy but fast computer text operations, so that user can very easily shape code the way he/she likes.

When rust language appeared on this forum, I've got interested in it especially when I learned that it is quite easy to make python extensions in rust. Rust also allows pattern matching which is wonderful feature for writing parsers. There is a rust library that parses modern javascript. When I saw all these, I had an impulse to try it for Leo import when I got some time to experiment with this.

There is also python module ply for writing lexer/parser functions in pure python. AFAR this module was requiring latest Python3.6, but as I write this and check the previous link, it seems that they made ply compatible with both python2 and python3 versions (which is good news for Leo if this module is to be used).
I can try to:
  • write lexer/parser using ply in pure python
  • write python extension for using rust library for creating lexers/parsers
Any of this will build AST of imported script. With AST at ones disposal, one can easily check all function definitions, object definitions, property definitions for their start line and end line. In case they contain more lines than some user preference this peace of code can be extracted into separate node, inserting @others if it is not already inserted. Also types of AST nodes can give very good suggestion for what the headline might be.

Javascript is important language nowadays and it might be worth building lexer/parser for it. But it is open-ended project. Javascript syntax is permanently changing and even native javascript tools have a hard time to keep up.

On the other hand for the effective use of AST we still would need those text manipulating functions. It seems to me that these functions should be our first target, and then if they would not suffice, build lexer/parser that would generate AST  which can be used to automatically split the source file into nodes.

Importers are extremely important for using Leo like an ordinary text editor. Suppose some user wants to try Leo and issues command leo somesource.js Leo would respond by opening new empty outline and importing the given source file. If it takes too much time (more than 500ms) it would make very poor impression on the user. In case the user is forgiving and willing to wait that long, at least what he/she sees after waiting must look good enough. Otherwise, most of the users would regret for even trying Leo.

Vitalije

Edward K. Ream

unread,
Feb 24, 2018, 12:57:18 PM2/24/18
to leo-editor
On Sat, Feb 24, 2018 at 9:49 AM, vitalije <vita...@gmail.com> wrote:

When rust language appeared on this forum, I've got interested in it especially when I learned that it is quite easy to make python extensions in rust.

​I didn't know that :-)
 
Rust also allows pattern matching which is wonderful feature for writing parsers. There is a rust library that parses modern javascript. When I saw all these, I had an impulse to try it for Leo import when I got some time to experiment with this.

There is also python module ply for writing lexer/parser functions in pure python. AFAR this module was requiring latest Python3.6, but as I write this and check the previous link, it seems that they made ply compatible with both python2 and python3 versions (which is good news for Leo if this module is to be used).
I can try to:
  • write lexer/parser using ply in pure python
  • write python extension for using rust library for creating lexers/parsers
Any of this will build AST of imported script. With AST at ones disposal, one can easily check all function definitions, object definitions, property definitions for their start line and end line. In case they contain more lines than some user preference this peace of code can be extracted into separate node, inserting @others if it is not already inserted. Also types of AST nodes can give very good suggestion for what the headline might be.

Javascript is important language nowadays and it might be worth building lexer/parser for it. But it is open-ended project. Javascript syntax is permanently changing and even native javascript tools have a hard time to keep up.

​I agree with you that Leo would benefit from making this effort.
 
On the other hand for the effective use of AST we still would need those text manipulating functions. It seems to me that these functions should be our first target, and then if they would not suffice, build lexer/parser that would generate AST  which can be used to automatically split the source file into nodes.

​Sounds reasonable.

Thanks for all these thoughtful comments.

Edward

vitalije

unread,
Feb 24, 2018, 1:48:13 PM2/24/18
to leo-editor
Here is one of rust libraries for generating lexer/parser LALRPOP

Here is rust library for writing shared library extensions for CPython rust-cpythton

Here is a video explaining how to write python extension in rust.


​I didn't know that :-)
 

I am certain that I saw one rust library that implements a high performance parser for modern javascript, but I can't remember the name. It was bunch of related libraries. I remember vaguely that one of those libraries has "joker" in its name, but I am not sure of it. 

Vitalije

Edward K. Ream

unread,
Feb 25, 2018, 3:40:05 AM2/25/18
to leo-editor
On Sat, Feb 24, 2018 at 12:48 PM, vitalije <vita...@gmail.com> wrote:
Here is one of rust libraries for generating lexer/parser LALRPOP

Here is rust library for writing shared library extensions for CPython rust-cpythton

Here is a video explaining how to write python extension in rust.

​Thanks for this.

This post clarified my thinking about allowing non-python code within Leo.​
 
​ The conclusion is that plugins (and only plugins) can require any kind of packages they like.

Edward
Reply all
Reply to author
Forward
0 new messages