string_expression_or_state_field_path
: state_field_path_expression
| string_expression
;
string_expression
: general_path_element
| String_literal
| Input_parameter
| CONCAT '('string_expression',' string_expression (',' string_expression)*')'
| SUBSTRING '('string_expression',' arithmetic_expression (',' arithmetic_expression)?')'
| TRIM '('((LEADING | TRAILING | BOTH)? (trim_character)? FROM)? string_expression')'
| LOWER '('string_expression')'
| UPPER '('string_expression')'
| aggregate_expression
| case_expression
| function_invocation
;
arithmetic_expression
: arithmetic_term
| arithmetic_expression op=( '+' | '-' ) arithmetic_term
;
hi. Are you having problems identifying the ambiguities or knowing how to resolve them? Identifying hotspots in ambiguities in the grammar is easy with the intellij plug-in for antlr 4.
Ter
On Dec 3, 2014, at 11:15 AM, Moritz Becker <moritz....@gmail.com> wrote:
> Hi,
> we are experiencing performance problems due to ambiguities in our grammar (see parser grammar and lexer grammar attached).
> The main problem is (we think) that we have different expression rules for different data types (like string_expression, arithmetic_expression etc.) and in all these expression rules we use the state_field_path_expression rule at some point on the far left side of a production. The scalar_expression rule bundles all these expression rules and therefore we have ambiguities once the parser enters the scalar_expression rule with a state_field_path_expression on the input.
> We don't know what would be the best way to sort out these ambiguities while avoiding duplicate code in the grammar.
> Can you give some advice please?
>
> --
> You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <JPQL_lexer.g4><JPQLSelectExpression.g4>
scalar_expression @leftfactor{state_field_path_expression}
: arithmetic_expression
| string_expression
| enum_expression
| datetime_expression
| boolean_expression
| coalesce_expression
| nullif_expression
| type_discriminator
| identifier
| Input_parameter
| case_expression
;
Hi Moritz,
I was able to run your code, but I was not able to reproduce a clear performance problem. With or without the @leftfactor action, and without any other special “tuning”, I’m observing an average time of 50ms to parse the sample expression using the “reference” build of ANTLR 4.3, and 30ms using the “optimized” build of ANTLR 4.4 (these are the latest builds of each in Maven Central). Can you provide some additional sample inputs that will highlight the areas that are causing problems for your application?
Note that to use the “optimized” build, you set the groupId in your pom.xml to read “com.tunnelvisionlabs” instead of “org.antlr” (for both the antlr4-runtime and antlr4-maven-plugin artifacts), and then you rebuild your project. If you set up the sample on GitHub somewhere I can send a pull request showing the changes I would make to simplify the project structure a bit. We could also use that project as a testing ground for the performance issues.
Thanks,
Sam
Hi Moritz,
I sent you a pull request with some grammar/project simplifications.
Regarding the test: You are running a single test with a small input fragment. Your test timing currently includes the complete overhead of initializing ANTLR 4, deserializing the lexer and parser ATNs, and constructing the initially empty DFA cache. This only reflects the behavior of ANTLR 4 if you are writing an application whose sole purpose is to parse a single small expression like this exactly one time. After the first time, the DFA cache starts to be used instead of constructing a new one, bringing with it a dramatic performance improvement.