Hello everyone,
We are excited to announce that we are developing our own programming language named ONE, based on ANTLR and similar syntax to C#. In the near future, we plan to make the language open-source, providing the community with the opportunity to contribute and collaborate.
ONE introduces some deviations from C#, such as:
The purpose of ONE in a nutshell:
ONE is designed to be the programming language for a browser-based flow editor, aimed at creating data flows with hardware devices and modules represented by graphical elements, similar to Node-RED. With ONE, you can write data flows also using a text-based programming language, complementing the graphical programming concept. ONE includes the complete C# language specification and extends it with syntax constructs for state machines, truth tables, and other elements. The engine of the data flow editor is written in C#, so the ONE code is transpiled to C# and then executed by the engine.
Current state of the project:
We picked an ANTLR C# grammar from the Roslyn page: https://github.com/dotnet/roslyn/blob/main/src/Compilers/CSharp/Portable/Generated/CSharp.Generated.g4. We are aware that this ANTLR grammar is not designed for immediate use, but the major advantage is that it fully represents the complete Roslyn syntax structure. So we adapted it to generate a working
ANTLR
Lexer and Parser.
As the basis for our lexer, we chose: https://github.com/antlr/grammars-v4/blob/master/csharp/CSharpLexer.g4. The logic for Python-like indentation was mainly taken from: https://github.com/antlr/grammars-v4/tree/master/python/python3.
Seeking Solutions for ANTLR Performance Issues:
The ANTLR parser works well and the grammar covers all C# syntax elements. However, with large files, the performance of the ANTLR parser is not particularly good. We believe improving performance will require restructuring the grammar.
We have read Gabriele Tomassetti's great and very helpful article https://tomassetti.me/improving-the-performance-of-an-antlr-parser/ and tried implementing some optimizations. However, after profiling the ANTLR parser, it seems the most time-consuming rules are still the left-recursive ones like type or expression. Rewriting the expression rule into a cascade of ever more precise expressions had the opposite effect and the performance actually got even worse.
Does anyone have ideas or experience in making ANTLR grammars more performant, or analyse them in a practical way? Any insights or suggestions would be greatly appreciated.
Here are the necessary files to generate the ANTLR Lexer and Parser, as well as the base classes. Also included are the simple code example and a more complex one (TestPattern.on) where the poor performance is noticeable.
Thank you