[Neo4j] Cypher language grammar

203 views

Skip to first unread message

dmi...@vrublevsky.me

unread,

Jan 6, 2016, 12:23:29 PM1/6/16

to neo4j-e...@googlegroups.com, openc...@googlegroups.com

Hi!

I am author of Cypher IntelliJ plugin. Recently i spend some time porting Cypher language from it’s Parboiled-based implementation (from cypher-compiler) to bnf, so I can it use then in plugin to generate parser.

The tricky part there is - I need Lexer. Parboiled-based parsers are lexer-less. This is great when you need AST and nothing more.

In my case I need Lexer that will be capable of parsing Cypher query, even invalid one.

Fortunately IntelliJ toolkit gives you possibility to generate Lexer using jFlex from bnf.

So. I wanna discuss several issues/problems that I encountered while doing all that stuff.

Keywords, function names & identifiers

In Cypher it is allowed to use any keyword or function name as identifier name.

For example, it’s possible to write such query:

```

MATCH nodes=(return)

RETURN nodes(nodes)

```

Currently Lexer generated by IntelliJ toolkit can’t deal with such cases.

From lexer perspective there is 6 keywords in query.

However generated Lexer is pretty simple. Probably I can fix this issue by writing my own Lexer, which will be capable of handling state and determine what exactly this should be - identifier or keyword.

Same thing applies to CypherPrettifier. If you execute above query in http://console.neo4j.org/, then you can see that query is incorrectly formatted.

Question: Are there any plans on making language more strict, so it won’t allow to use keywords as identifier (probably during openCypher initiative)?

Functions names & case insensitivity

Cypher has some built-in functions.

For example: `toInt()`

In reality, from Cypher compiler perspective those function are case-insensitive.

So, all such examples are valid: `toint`, `ToInT`, `toINT` and others.

While this isn’t bad, it can sometimes arise interesting effects.

See - https://github.com/FylmTM/intellij-plugin-cypher/issues/20

Again problem is with Lexer.

Code: ` (E:Flavour{name:'E', description:'Light, Medium-Sweet, Low Peat, with Floral, Malty Notes and Fruity, Spicy, Honey Hints.'}),`

In this code sample `E` is parsed by Lexer as function name. Because there is `e()` function and function name is case-insensitive.

Question: Same as above. Are there plans to forbid to use function names as identifiers?

Questionable rules

I encountered several questionable rules.

1) RelationshipPatternSyntax - this one specifies syntax for creating constraints.

```

RelationshipPatternSyntax ::= ("()-[" Identifier RelType "]-()")

| ("()-[" Identifier RelType "]->()")

| ("()<-[" Identifier RelType "]-()")

```

Pattern start & endings are hardcoded. However everywhere else when pattern is described `Dash`, `LeftArrowHead` and `RightArrowHead` are used. And this rules support additional style of dashes and arrows.

Basically it means that I can’t create constraint using additional supported dash & arrow head variants.

2) Expression1 - Functions that are not functions.

There are severals branches that looks like a function, but not really a function from grammar perspective.

I am curious why this was designed in that way.

They are called “Predicates” id documentation (http://neo4j.com/docs/stable/query-predicates.html).

>>> Predicates are boolean functions

3) _PRAGMA - actually I can’t find any information what this is.

There are some clues in google, but no one in Neo4j documentation.

Reply all

Reply to author

Forward

0 new messages