Bug - me or Antlr? Grammar attached

93 views
Skip to first unread message

rtm...@googlemail.com

unread,
May 31, 2021, 9:08:10 AM5/31/21
to antlr-discussion

Hi all,
this one's been throwing me for a while. I can work around it but I'd rather understand it.

Grammar is at end. It's hugely minimised from an original sql-like language.

Essentially a 'select' statement takes an optional scaffold part,
followed by an optional 'with' statement,
followed by the 'select' statement itself.
These have been reduced down to single lex tokens here for simplicity.

Optionality is done with an "empty_production" rule, which I prefer the explicitness of over just '... | ;'. It should not matter.

So with this grammar, with this:


=== input ===
scaffold
select
;

===


It parses ok, but comment out the 'scaffold':


=== input ===
--scaffold
select
;

===


and you get:


line 2:0 mismatched input 'select' expecting {'with', 'scaffold'}
Stack overflow.



Scaffold is optional, so why the problem?

Odder yet (to me), if you add a statement with a scaffold above that statement without a scaffold (which failed previously), it then succeeds overall:


=== input ===
scaffold
select
;

--scaffold       <<< this failed before
select
;

===


It parses ok.  I don't get it.

From playing about it seems the optionality of the 'with' statement interferes with the optionality of the 'scaffold' statement, but that's just an impression.
AIUI they should not interfere because there's no actual ambiguity in the grammar.

I'm using the AntlrVSIX plugin for visual studio, version 8.3, which reports the Antlr parser version as 4.9.

Am I misunderstanding something?

thanks

jan


=================================
grammar LDB;


start_parse returns [LDBitems ldbis] :
        siX = ldb_items
       EOF
    ;


ldb_items :
        (
            sisX += select_statement
            SEMICOLON
        ) +
    ;


select_statement :
        sns = opt_set_name_scaffold
        wctec = opt_with_CTEs_clause
        qe = query_expression
    ;


opt_set_name_scaffold :
    SCAFFOLD        
    |
        empty_production        
    ;


opt_with_CTEs_clause :
        WITH        
    |
        empty_production        
    ;


query_expression :
        SELECT        
    ;


empty_production : ;




SELECT : 'select' ;
WITH : 'with' ;
SCAFFOLD : 'scaffold' ;

SEMICOLON              : ';' ;



fragment WUnl : ( '\r' ? )  '\n' ;

SLCOMMENT : ( '--' .*? WUnl ) -> skip ;



fragment ALLWSes   : [ \t\r\n]+ ;

SKIPWS : ALLWSes -> skip ;
==================================

rtm...@googlemail.com

unread,
Jun 1, 2021, 5:46:49 AM6/1/21
to antlr-discussion
Remove the use of the 'empty_production' rule so we're using implicit empty productions:


opt_set_name_scaffold :
        SCAFFOLD       
    |
    ;


opt_with_CTEs_clause :
        WITH       
    |
    ;


and now it parses successfully where before it failed. This really looks like a bug.

Having used the explicit 'empty_production' rule all over, I don't understand why this has happened now, and with just this rule.

Any thoughts?

jan

rtm...@googlemail.com

unread,
Jun 2, 2021, 3:23:02 AM6/2/21
to antlr-discussion
It would be really helpful to have some kind of idea about what to do here. Hitting this bug is worrying because I've used this '... or  empty_production' style all over with simple embedded actions to build the AST, simple example:

opt_SLC_escape_clause returns [SLCEscapeClause ec] :
        ESCAPE  expr
        {
            $ec = new SLCEscaped($expr.e);
        }
    |
        empty_production
        {
            $ec = new SLCNotEscaped();
        }
;

and it's worked fine until now. I think that's because I've accidentally disabled the bug by putting something before it to parse which restores expected behaviour, see my 2nd post, but I'm not sure.

There's quite a lot of this code, and I don't want to rewrite it as

opt_SLC_escape_clause returns [SLCEscapeClause ec] :
        (ESCAPE  expr) ?
        {
            if ...
        $ec = new SLCEscaped($expr.e);
            else
        $ec = new SLCNotEscaped();
        }
;

*especially* as I don't know if this bug would reappear somehow.

I could just use the plain empty alt (remove the explicit 'empty_production' leaving a plain '|' and hope that works) but I've got a script to detect lone '|' as they are are very easy to introduce by accident and an absolute timewaste to track down, I've lost days to them.

Any thoughts welcome.

I guess as I've had no-one pointing out an obvious error of mine in 2 days, I'll consider it a genuine bug and file it.

cheers

jan

eric vergnaud

unread,
Jun 2, 2021, 4:02:34 AM6/2/21
to antlr-discussion

What is the rationale for not using '?' for optionality ?

rtm...@googlemail.com

unread,
Jun 2, 2021, 5:02:28 AM6/2/21
to antlr-discussion
1. Doing it either  '...| ;'  or  '...?'  should work, so why not the former?

2. Given 1. above, it's neater. Antlr decides which branch, I just add code on each branch.

3. Given more than 2 branches in a rule, it seemed definitely neater and safer to have antlr handle all the branching code.

3. As well as 2. above - and most important - at the start I couldn't find for sure what the 'right' way to test which branch was being taken by antlr. A 1/2 hour search yesterday (as I thought I might have to rewrite all this) got me this on SO <https://stackoverflow.com/questions/54770084/how-do-you-reference-an-optional-rule-rulename-from-within-a-grammar-action-i> and also (can't find link) that doing "$rulename.text != null" seemed to work (I tested it) but I never knew for sure, and I still don't.

Which leads to the state of antlr documentation, which is not great. I would love to work on an antlr FAQ and antlr how-to because it really needs it, but am not in a position to now. I have a large project that needs completing.

HTH?

jan

Eric Vergnaud

unread,
Jun 2, 2021, 5:09:28 AM6/2/21
to antlr-di...@googlegroups.com
Have you read the book?
If not, I suggest you do so.
I believe ‘?’ is the official way to support optionality.

-- 
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/fda613fb-f156-458e-b26e-9ee5e41e1e81n%40googlegroups.com.

rtm...@googlemail.com

unread,
Jun 2, 2021, 5:29:00 AM6/2/21
to antlr-discussion
I've read the book at least twice, once ~5 years ago and once last year when I started this project. I don't recall anywhere it saying this. In fact it uses empties itself.

Also on P.85:
"
To make our fields more flexible than they were in the previous chapter, let’s allow arbitrary text, strings, and even empty fields in between commas.

examples/CSV.g4
field
: TEXT
| STRING
|
;
"


on p.128
"
field
: TEXT # text
| STRING # string
| # empty
;
"


p. 261
"
superClass
: 'extends' ID
| // empty means other alternative(s) are optional
;
"


So they're legal, acceptable, and I've still got a bug that's hit my project with at least 6 month's work in it, which is rather concerning me.

jan

Eric Vergnaud

unread,
Jun 2, 2021, 6:14:19 AM6/2/21
to antlr-di...@googlegroups.com
am I incorrect in suspecting that your ‘odder’ example stops parsing after the first SEMI, as expected by your ldb_items rule which will return after encountering it ?

also may I insist that you try the following: 
select_statement: SCAFFOLD? WITH? qe = query_expression ;

finally are you using the official antlr C# runtime or the one from tunnelvision ?

You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/AnG5BA-OrTw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/f5ff17b4-b9d9-4479-a5a7-ac09cf311f76n%40googlegroups.com.

rtm...@googlemail.com

unread,
Jun 2, 2021, 7:06:02 AM6/2/21
to antlr-discussion
IIUC I think you're asking if I'm returning early, which would make the 'odder' example make sense (it would be skipping the 2nd, problematic, part if that was happening) but I don't think so. If I modify the parser to

query_expression :
        SELECT       
        {
            System.Console.WriteLine("hit SELECT");
        }
    ;

and run it on the 'odder' stuff output is:

hit SELECT
hit SELECT

So it's def got to it twice.

> also may I insist

Good idea, just to be sure. Ok, I set it up so it definitely fails to parse using my rules. Then replace it with your '??' suggestion and comment out the now-unused rules 'opt_set_name_scaffold', 'opt_with_CTEs_clause' and 'empty_production' for safety.
Now it works as expected.


> are you using the official antlr C# runtime or the one from tunnelvision?

Umm, dunno. It's the one that comes with Ken Domino's AntlrVSIX VisualStudio plugin. I'm using VS 2019 if that helps.

cheers

jan

Eric Vergnaud

unread,
Jun 2, 2021, 7:19:48 AM6/2/21
to antlr-di...@googlegroups.com
re 1 you are correct
your sample grammar works fine in java, so the issue if any can only come from the C# runtime
can you check the assembly name?

rtm...@googlemail.com

unread,
Jun 2, 2021, 7:42:36 AM6/2/21
to antlr-discussion

Hi,
I don't know how to do this or what to look for but the AntlrVSIX about.. says "NET Runtime version: v4.0.30319"

When I run my stuff the dlls load in VS and I can see one relevant which is

[snipped]\erland_project\LDB\bin\Debug\netcoreapp3.1\Antlr4.Runtime.Standard.dll'.

If I go there and look at the dll properties it includes the following

File Version    4.8.0.0
product name    Antlr4.Runtime
Copyright    Antlr organisation

Do any of these help, if not give me a quick pointer and I'll dig.

thanks

jan

Wanadoo

unread,
Jun 2, 2021, 8:09:31 AM6/2/21
to antlr-di...@googlegroups.com
Ok this is definitely the official version but isn’t there a version discrepancy between the tool and the runtime?
Can you try 4.9.2 ?

Envoyé de mon iPhone

Le 2 juin 2021 à 13:42, 'rtm...@googlemail.com' via antlr-discussion <antlr-di...@googlegroups.com> a écrit :



rtm...@googlemail.com

unread,
Jun 2, 2021, 8:31:50 AM6/2/21
to antlr-discussion

You mean runtime is 4.8 but parser is 4.9? Is this what you mean?

I don't know how to try 4.9.2 because I'm just using the VSIX plugin and don't know how to modify anything. Do you mean literally download the dll and drop it in that file location? (...\bin\Debug\netcoreapp3.1) I'll give it a try anyway.

rtm...@googlemail.com

unread,
Jun 2, 2021, 8:54:43 AM6/2/21
to antlr-discussion
Hi,

Original DLL with my test code:

line 8:0 mismatched input 'select' expecting {'with', 'scaffold'}
Stack overflow.

Swap out the DLLs so Antlr4.Runtime.Standard.dll now 4.9.2.0 instead of 4.8.0.0, output is:

hit SELECT

In other words, success!

(also switched to main project and ran the unit tests and nothing broke, so even better)

So, is just swapping in the new DLL safe - anything else needed or is replacing the DLL alone sufficient?

Awaiting your reply, upon which enormous and effusive thanks, andd then I will contact the VSIX maintainer Ken Domino

cheers

jan

eric vergnaud

unread,
Jun 2, 2021, 9:09:18 AM6/2/21
to antlr-discussion
Not sure how the VSIX plugin works, so best to check with ken how safe it is to just swap the dll.
Good news is the bug is already fixed.

Ken Domino

unread,
Jun 2, 2021, 9:56:12 AM6/2/21
to antlr-discussion
A couple of notes here.

1) I don't recommend that you use Antlrvsix. I have complained to Microsoft to please add proper LSP support VS2019. Who knows if it will ever be done. First, it does not use the latest protocol version. It's years behind the current version. VSCode is up to date. Second, writing an extension for LSP for VS2019 is trying to hit a moving target: the API changes with each release, e.g., 16.8, 16.9, 16.10, ... It's ridiculous. Instead, if you use VS2019, add in Mads extension "Open in Visual Studio Code", and when you want to edit the grammar file, click on the "open in VSCode" pop up. You can add to VSCode either my Antlrvsix extension for VSCode, or Mike's Antlr4 Grammar Syntax Support extension. My extension is a full LSP server implementation; Mike's is VSCode specific. You can still use Antlrvsix, but I haven't maintained it, and haven't looked at it lately.

2) The editor extension you use has absolutely nothing to do with the building of a C# program. You can build and run the C# application without ever opening Visual Studio. But, you should know what version of Antlr you are using. You can tell whether you are using Antlr4.Runtime.Standard (the official Antlr runtime), or Antlr4.Runtime (Harwell's alternative runtime and tool) by looking at your .csproj file. I recommend, though, that you use Antlr4.Runtime.Standard since it is being maintained. In either case, you never have to download and Antlr tool .jar file.

3) I tried the grammar you posted in your first message with Antlr4.Runtime.Standard, and after deleing "return ...." in the start_rule, it parses fine either input.

Ken

rtm...@googlemail.com

unread,
Jun 2, 2021, 11:14:21 AM6/2/21
to antlr-discussion
Hi,

> 1) I don't recommend that you use Antlrvsix. I have complained to Microsoft to please add proper LSP support VS2019. Who knows if it will ever be done. First, it does not use the latest protocol version. It's years behind the current version. VSCode is up to date. Second, writing an extension for LSP for VS2019 is trying to hit a moving target: the API changes with each release, e.g., 16.8, 16.9, 16.10, ... It's ridiculous...

I've heard elsewhere about VS being like this and I'm sorry the trouble its caused you.  For me I always edit the grammar in emacs. I just want VS to compile the stuff together then run it; LSP for C# is helpful, for Antlr grammars I've never needed it. They very straightforward.



> 2) The editor extension you use has absolutely nothing to do with the building of a C# program. You can build and run the C# application without ever opening Visual Studio

Understood, but isn't the issue here that a runtime DLL didn't match the compile-time stuff? I am pretty sure I just installed VSIX which just pulled down everything it needed. I don't recall I got the dll manually (today being the sole exception), so I presume the dll was fetched by VSIX.


> In either case, you never have to download and Antlr tool .jar file.

Indeed, and I think I never downloaded the runtime either (please excuse me if I'm being dense) so I don't know what happened.


> 3) I tried the grammar you posted in your first message with Antlr4.Runtime.Standard, and after deleing "return ...." in the start_rule, it parses fine either input.

I don't understand this at all. If you want I can do a clean OS install in a VM, clean install of VS2019 in that, and see what gets installed and in what version, if that ewould help.

cheers

jan

Ken Domino

unread,
Jun 2, 2021, 2:30:05 PM6/2/21
to antlr-discussion
Hi Jan, It sounds like you just don't even need Antlrvsix, so just uninstall that from VS2019. But, it does sound like you should update your Nuget dependencies for all projects of your C# program: in VS2019, open the solution, go to the "Solution Manager", right-click on each project, look for "Manage Nuget Packages", then in the Nuget panel update the packages that are recommended to be updated. There's no need to reinstall VS2019, since that's not the problem. --Ken

rtm...@googlemail.com

unread,
Jun 3, 2021, 10:16:16 AM6/3/21
to antlr-discussion
Hi Ken,
I'm not uninstalling anything, I've got a working system and that makes me happy!
I can, and some years ago have, put together a cli batch-build for antlr and it was fine but this is just easier.

I did as you said with the package updating, noted it changed the .csproj file, and when a build was done the right version of the DLL appeared in the directory. So that's hot.

Thank you

jan
Reply all
Reply to author
Forward
0 new messages