I am currently working on improving my implementation of autocompletion for ANTLR. It is included in the open-source project Kanvas (a framework to build editors based on ANTLR).
The way I find the types of the tokens that can follow the caret is by interpreting the ATN data structure, following the transitions and remembering the states I have traversed, so that I can understand which states are incompatible with my current history of states traversed.
That part works and I can figure out which types of tokens can follow. I call this part "syntactic autocompletion" and it works for all kinds of tokens but identifiers. I wrote the algorithm once and I can apply it without changes to all possible ANTLR parsers.For identifiers I would need to understand which symbols are visible in a certain context and propose only them. This is of couse language specific.
How do I know all the events and states? I grab them from the last version of the AST that I was able to build. This is not ideal and it does not work properly all the time but it seems to me decent enough for now (if you have better ideas I would be happy to hear them).
Now the problem is when I need to give suggestions specific to the position of the caret. In a language I can define functions and when an ID can follow the caret inside the function I want to suggest the parameters:
fun foo(Int myParam) Int {// here I would suggest myParam}
// here I would NOT suggest myParam
I am not sure what is the best way to implement this: I would probably need to build incomplete ASTs.
My typical scenario is that a user type:
fun foo(Int myParam) Int { // incomplete function
// here I would expect that it suggest myParam to me
Do you have implemented a solution to similar problems? Do you have any suggestion?
Hey Federico, I didn't know you are still working on your auto completion code :-)
I am currently working on improving my implementation of autocompletion for ANTLR. It is included in the open-source project Kanvas (a framework to build editors based on ANTLR).I wish we could join forces and you would work with me on the Visual Studio Code extension for ANTLR4 (https://marketplace.visualstudio.com/items?itemName=mike-lischke.vscode-antlr4), instead writing an own editor (but you would have to switch from Kotlin to Typescript).
The extension I'm working on is based on 2 node modules I also wrote, namely:- Code Completion Core for ANTLR4: https://github.com/mike-lischke/antlr4-c3- ANLTR4 Grammar Parsing Service (graps): https://github.com/mike-lischke/antlr4-graps
The way I find the types of the tokens that can follow the caret is by interpreting the ATN data structure, following the transitions and remembering the states I have traversed, so that I can understand which states are incompatible with my current history of states traversed.When I started with my implementation I also consulted your code, but had quite some trouble with it. Especially the recursion per state approach is a serious problem for complex grammars, not to mention the slowness I saw. Speed is actually one of the biggest concerns here, also my implementation is too slow for complex grammars, which is why I implemented an ATN graph viewer in my vscode extension (http://www.soft-gems.net/files/atn.gif). That will hopefully help to better understand how to process the ATN.
A hint: don't track states you visited before and ignore them later on. This will not work with multiple occurrences of the same subrule or recursive rules.
That part works and I can figure out which types of tokens can follow. I call this part "syntactic autocompletion" and it works for all kinds of tokens but identifiers. I wrote the algorithm once and I can apply it without changes to all possible ANTLR parsers.For identifiers I would need to understand which symbols are visible in a certain context and propose only them. This is of couse language specific.There is a way to make even this language neutral, provided a few conditions are met. Read the description of antlr4-c3 (https://github.com/mike-lischke/antlr4-c3/blob/master/readme.md) to understand how to solve that (a summary is below).How do I know all the events and states? I grab them from the last version of the AST that I was able to build. This is not ideal and it does not work properly all the time but it seems to me decent enough for now (if you have better ideas I would be happy to hear them).One of the premises for useful auto completion suggestions is that the code is syntactically correct up to the caret position. Otherwise you can never be sure what the user actually intended. That means however you should be able to collect all symbols up to this point using a normal tree visitor. Symbols are usually managed in a symbol table and can be hierarchically organized, for instance you can have one for each source file + a number of static ones for e.g. functions from included libraries and runtimes. The antlr4-c3 module also contains a symbol table implementation.Now the problem is when I need to give suggestions specific to the position of the caret. In a language I can define functions and when an ID can follow the caret inside the function I want to suggest the parameters:
fun foo(Int myParam) Int {// here I would suggest myParam}
// here I would NOT suggest myParam
I am not sure what is the best way to implement this: I would probably need to build incomplete ASTs.My approach is different and works 100% of the time for everything that can be derived from the grammar. The problem is how to make domain knowledge available to the auto completion engine, without creating a lot of extra code to handle language specific parts. The solution I found is very simple: your grammar can hold that knowledge. You would not only structure your grammar after logical aspects, but in addition you include rules for all entities (function names, parameter names, class names, you name it) you wan to have auto completion for. Then when you walk the ATN and reach the caret position, you only need to go up in the rule call chain until you find one of these rules. You can then return the rule index as candidate and the caller knows exactly what type of symbol must be shown at that point. The surrounding editor code only has to configure the c3 engine by telling it which rules (rule indices actually) are relevant and should be returned when found. I have explained that a bit more in the readme.md file mentioned above.
My typical scenario is that a user type:
fun foo(Int myParam) Int { // incomplete function
// here I would expect that it suggest myParam to me
Do you have implemented a solution to similar problems? Do you have any suggestion?One problem that might look first as if it is very difficult to solve is scoping. Not all symbols are visible everywhere. Hence you need a way to query *visible* symbols at a given point. Also this can easily be done by a symbol table implementation. Other than what the name suggests it's not really a table but a tree (some symbols, like blocks, function bodies or modules can hold other symbols). When you visit your parse tree to collect symbols you *know* in which scope you are, hence you can add found symbols at the right position in the hierarchy and querying visible symbols from a given symbol (position) is then trivial. My antlr4-c3 implementation contains functions for that: https://github.com/mike-lischke/antlr4-c3/blob/master/src/SymbolTable.ts#L283. It returns all symbols of a given type (e.g. all class names) visible from this symbol's position, including those from dependent symbol tables (traversing so the entire hierarchy).In order to see how to use all that look either in the module's tests (https://github.com/mike-lischke/antlr4-c3/tree/master/test), which includes code completion for a simple expression parser as well as for a C++14 grammar, or study the antlr4-graps code which uses the c3 code.
I wish we could join forces and you would work with me on the Visual Studio Code extension for ANTLR4 (https://marketplace.visualstudio.com/items?itemName=mike-lischke.vscode-antlr4), instead writing an own editor (but you would have to switch from Kotlin to Typescript).I would love to join forces. I am now writing a solution for desktop editors but I see the need also for web editors. Do you think your project could be adapted to work both in Visual Studio Code and in a web editor?
Your project is not using the Language Server Protocol, right?
I could work on mine and your project.
Also, Kotlin compiles to JavaScript... just saying... :)
I have considered using your approach of introducing rules that could be used as hint for the auto-completion (e.g., ParamRef or VariableRef). I think it would help but on one hand I can see users misusing it and later opening tickets :D on the other hand there are rules for scoping anyway, that makes things not so easy to solve. I have functions for scoping resolution that I use later in interpreters, compilers, validators, etc. and I would like to reuse them for auto-completion.
This morning I work a little bit more on my approach. Currently I do the following:- interpret the ATN with my own code to find the possible next tokens at the point of the caret- get the parse-tree from ANTLR (java generated classes) and translate in my AST (my Kotlin classes)- I find the element of my AST corresponding to the position of the caret. At that point I ask to it what symbols are visible
For example, I have this language that permits to define annidated functions. This is the auto-completion I use when encountering an ID token:when (autocompletionSurroundingInformation.rulesStack.last()) {"function" -> {// just suggesting a name for the function about to be defined// I could check the AST to verify it is uniqueres.add("myFun")}"param" -> {// same as before, just suggesting a name for a new paramres.add("myParam")}"expression" -> {// here I could have value references or function calls// if I had use your approach of having different rules I could differentiate// them, however I am just using labels and I do not see labels in the ATN,// so they are just expressions and I suggest all variables and functions// Note that allVisibleValues and allVisibleFunctions are not trivial functions// defined on my AST classescurrentNode?.allVisibleValues(ast!!.childParentMap())?.forEach { res.add(it.name!!) }currentNode?.allVisibleFunctions(ast!!.childParentMap())?.forEach { res.add(it.name!!) }}}
I will look at your code to get inspired but I am seriously interested in discussing working on this together. We can discuss it here or by email.
This morning I work a little bit more on my approach. Currently I do the following:- interpret the ATN with my own code to find the possible next tokens at the point of the caret- get the parse-tree from ANTLR (java generated classes) and translate in my AST (my Kotlin classes)- I find the element of my AST corresponding to the position of the caret. At that point I ask to it what symbols are visibleI find it awkward to create an AST when you have a parse tree. An AST is essentially the input stream together with some artificial hierarchy (I have used that extensively myself with ANTLR3), but you get all that from the parse tree anyway (and more). In fact with a parse tree your grammar becomes more than just defining how to match your language, but it can be used to define structure/hierarchy explicitly. Really, you don't need an AST anymore in ANTLR4.
For example, I have this language that permits to define annidated functions. This is the auto-completion I use when encountering an ID token:when (autocompletionSurroundingInformation.rulesStack.last()) {"function" -> {// just suggesting a name for the function about to be defined// I could check the AST to verify it is uniqueres.add("myFun")}"param" -> {// same as before, just suggesting a name for a new paramres.add("myParam")}"expression" -> {// here I could have value references or function calls// if I had use your approach of having different rules I could differentiate// them, however I am just using labels and I do not see labels in the ATN,// so they are just expressions and I suggest all variables and functions// Note that allVisibleValues and allVisibleFunctions are not trivial functions// defined on my AST classescurrentNode?.allVisibleValues(ast!!.childParentMap())?.forEach { res.add(it.name!!) }currentNode?.allVisibleFunctions(ast!!.childParentMap())?.forEach { res.add(it.name!!) }}}This is essentially the only language specific part the code completion core cannot do implicitly. You have to resolve the suggested items (in c3 the rule indices) to symbol types. That's usually a very trivial process, but nonetheless must be done in consumer code.
I will look at your code to get inspired but I am seriously interested in discussing working on this together. We can discuss it here or by email.Best is probably to just look how things work in vscode. For a start you can also use an interactive node shell and play with the c3 + graps classes. Just mail me directly when you need help (e.g. how to develop/debug an extension in vscode).Btw, I have a patch pending for ANTLR4 which allows to export interpreter data (symbols and the serialized ATN) in a separate file.
This is mostly useful for setting up interpreters in other targets than Java (for grammar debugging). This info can also be used for code completion (but then you cannot run predicates, since you have no full parser).
I find it awkward to create an AST when you have a parse tree. An AST is essentially the input stream together with some artificial hierarchy (I have used that extensively myself with ANTLR3), but you get all that from the parse tree anyway (and more). In fact with a parse tree your grammar becomes more than just defining how to match your language, but it can be used to define structure/hierarchy explicitly. Really, you don't need an AST anymore in ANTLR4.Well, for me an AST works best for different reasons. One of the reasons is that I really do not like the classes generated by ANTLR. I prefer to write mines, with all classes having a certain ancestor and I do some transformations between the parse tree and the AST. If the language is simple they stay similar but if it is not they can differe quite a bit. Also, for complex languages I may want to do transformations on the AST to simplify interpreting or compiling.
Well, I try to never use predicated anyway