First off. I got tired of reporting issues to the issue tracker, so I spent the last 2 days reading through the Esprima source code to understand what's going on under the hood, and I discovered a lot of things I need to understand... E.g. some source code are really outdated ano 2017.
As I understand things, most of the source code has been inherited from the very first public version 1.0. And everthing after that have been just "modifying the source". I'm right? Example the tokens.
And there also seem to exist support for legacy browsers / NodejS versions (0.8 - 0.12) that is no longer needed. And not to mention that the codebase isn't V8 optimized.
After 2 days I got a good understanding of the source code, so I felt I was ready to fix all bugs.
I started out with the scanner class. The class itself seems to have a performance loss at about 33% percent and is
not optimized for V8. And I also discovered bugs with octal literals, line terminators, long binary digits on some computers, escaped chars etc...
Even the way the punctuators was handled looked like a mess to me, and it was then I noticed the
curlyStack code and started to wonder what is the point? That is related to my first question raised on the Github repo. I found a improved solution for that one, but that took me into more complex questions, and I also discovered that the
scanTemplate() function would need an rewrite. This again took me to the tokenizer class - where I solved similar template issue. But I here discovered that this class was a mess as well, As emphasized with some issue tickets.
Too make this short. I have problems too understand what is the problems with fixing reported bugs. The main problem as I see it, is that the codebase now is too fragmented - old fashion - so it's hard to fix it in a performant way? I ran into this issue as well.
Therefor I have a few questions to ask
1. What is the thoughts about this? Too have it's own Scope tracker, or inline arrays and push and pull to them ala Acorn, or simply a solution ala Shift? See the Ecma specs regarding this.
Located here: It says
3. If body is a List of errors, then return body.
Under all circumstances it has to return an list of errors if we are following the specs. And Esprima does not do that, I'm right? There are more issues into this as well. Mainly performance. Scanning the whole AST tree could be expensive, and also use unnecessary CPU cycles.
As an experiment I have made a "Scope tracker" that catches all early errors for my own experimental code, and I gained performance on it having a inlined solution.
2. Then we have the grammar issues. Is it updated to ES2017? Doesn't seem like it, and there are still some binging issues I'm aware of. (
not reported on the issue tracker) The way the grammar are handlet now it's expensive, uses too much CPU cycles and memory. There are other options. See how TypeScript does it as one example.
3. Precedence climbing. Comparing to Shift first. They have actually a similiar solution, but an improved one as I noticed. But what is the point with precedence climbing in the first place? In general with recursive descent parsers is to do the LHS stuff first before entering the parse stuff. and it would be much easier to do the right derivation that way. Wich lead me to the fact that precedence climbing could be avoided if done that way. I have read issue tickets regarding this matter,and tried to figure it out. I tend too agree that the recursive solution used in Esprima 1.0 was no good, but there are room for improvements. 2017 now! See
TypeScript solution This is one improved way to handle it.
So my question is. What is the plans here?
For now I'm focusing on refactoring the scanner class to improve it, and reduce memory and CPU usage and solve known bugs.