Hi!
I am author of Cypher IntelliJ plugin. Recently i spend some time porting Cypher language from it’s Parboiled-based implementation (from cypher-compiler) to bnf, so I can it use then in plugin to generate parser.
The tricky part there is - I need Lexer. Parboiled-based parsers are lexer-less. This is great when you need AST and nothing more.
In my case I need Lexer that will be capable of parsing Cypher query, even invalid one.
Fortunately IntelliJ toolkit gives you possibility to generate Lexer using jFlex from bnf.
So. I wanna discuss several issues/problems that I encountered while doing all that stuff.
Keywords, function names & identifiers
In Cypher it is allowed to use any keyword or function name as identifier name.
For example, it’s possible to write such query:
```
MATCH nodes=(return)
RETURN nodes(nodes)
```
Currently Lexer generated by IntelliJ toolkit can’t deal with such cases.
From lexer perspective there is 6 keywords in query.
However generated Lexer is pretty simple. Probably I can fix this issue by writing my own Lexer, which will be capable of handling state and determine what exactly this should be - identifier or keyword.
Same thing applies to CypherPrettifier. If you execute above query in
http://console.neo4j.org/, then you can see that query is incorrectly formatted.
Question: Are there any plans on making language more strict, so it won’t allow to use keywords as identifier (probably during openCypher initiative)?
Functions names & case insensitivity
Cypher has some built-in functions.
For example: `toInt()`
In reality, from Cypher compiler perspective those function are case-insensitive.
So, all such examples are valid: `toint`, `ToInT`, `toINT` and others.
While this isn’t bad, it can sometimes arise interesting effects.
Again problem is with Lexer.
Code: ` (E:Flavour{name:'E', description:'Light, Medium-Sweet, Low Peat, with Floral, Malty Notes and Fruity, Spicy, Honey Hints.'}),`
In this code sample `E` is parsed by Lexer as function name. Because there is `e()` function and function name is case-insensitive.
Question: Same as above. Are there plans to forbid to use function names as identifiers?
Questionable rules
I encountered several questionable rules.
1) RelationshipPatternSyntax - this one specifies syntax for creating constraints.
```
RelationshipPatternSyntax ::= ("()-[" Identifier RelType "]-()")
| ("()-[" Identifier RelType "]->()")
| ("()<-[" Identifier RelType "]-()")
```
Pattern start & endings are hardcoded. However everywhere else when pattern is described `Dash`, `LeftArrowHead` and `RightArrowHead` are used. And this rules support additional style of dashes and arrows.
Basically it means that I can’t create constraint using additional supported dash & arrow head variants.
2) Expression1 - Functions that are not functions.
There are severals branches that looks like a function, but not really a function from grammar perspective.
I am curious why this was designed in that way.
>>> Predicates are boolean functions
3) _PRAGMA - actually I can’t find any information what this is.
There are some clues in google, but no one in Neo4j documentation.