Getting to First Base with ANTLR4 C# Target, Helllo.g4, and no Frills

1,445 views
Skip to first unread message

phurst

unread,
Aug 14, 2013, 2:11:54 PM8/14/13
to antlr-di...@googlegroups.com

I’m trying to make a C# parser for the Hello.h4 grammar, using only the antlr generator (no VS integration or other tooling, other than antlr4-csharp-4.0.1-SNAPSHOT-complete.jar and Antlr4.Runtime.v4.5.dll). I can get it to create the parser and a listener in C#, but it does not produce a C# lexer class. I tried to roll my own based on what was in the Java lexer, but, no dice.

 

Can anyone point me in the right direction?

 

Here’s what I tried… I started with the following Hello.g4 grammar:

 

grammar Hello;

options {  language=CSharp_v4_5; }

HELLOWORD : 'hello' ;

r  : HELLOWORD ID ;         // match keyword hello followed by an identifier

ID : [a-z]+ ;             // match lower-case identifiers

WS : [ \t\r\n]+ -> skip ;

 

Then I ran the org.antlr.v4.Tool on it and got the following files:

 

HelloLexer.java

HelloLexer.tokens

HelloParser.cs

HelloListener.cs

HelloBaseListener.cs

Hello.tokens

 

Notice there is no HelloLexer.cs file. Then I created a C# console program that calls the following method:

 

private void RunParser() {

                AntlrInputStream inputStream = new AntlrInputStream("hello world\n");

                MyLexer helloLexer = new MyLexer(inputStream);

                CommonTokenStream commonTokenStream = new CommonTokenStream(helloLexer);

                HelloParser helloParser = new HelloParser(commonTokenStream);

                MyListener myListener = new MyListener();

                helloParser.AddParseListener(myListener);

                HelloParser.RContext rContext = helloParser.r();

}

 

That’s based on some V3 examples I found. Is it about right?

 

In the absence of a generated Lexer class I hacked up the following:

 

    public class MyLexer : Lexer {

        public MyLexer(ICharStream input) : base(input) {

            Interpreter = new LexerATNSimulator(HelloParser._ATN);

        }

        public override string[] RuleNames {

            get { return HelloParser.ruleNames; }

        }

        public override string GrammarFileName {

            get { return "Hello.g4"; }

        }

    }

 

When I run the program, it crashes with the following error:

 

ERROR: System.IndexOutOfRangeException: Index was outside the bounds of the array.

   at Antlr4.Runtime.Atn.LexerATNSimulator.Match(ICharStream input, Int32 mode)

   at Antlr4.Runtime.Lexer.NextToken()

   at Antlr4.Runtime.BufferedTokenStream.Fetch(Int32 n)

   at Antlr4.Runtime.BufferedTokenStream.Sync(Int32 i)

   at Antlr4.Runtime.BufferedTokenStream.Setup()

   at Antlr4.Runtime.BufferedTokenStream.LazyInit()

   at Antlr4.Runtime.CommonTokenStream.Lt(Int32 k)

   at Antlr4.Runtime.Parser.EnterRule(ParserRuleContext localctx, Int32 state, Int32 ruleIndex)

   at HelloParser.r() in f:\Project\Grammars\Hello\Hello\HelloParser.cs:line 53

   at Hello.Program.RunParser() in f:\Project\Grammars\Hello\Hello\Program.cs:line 33

   at Hello.Program.Run() in f:\Project\Grammars\Hello\Hello\Program.cs:line 17

 

This problem is apparently due to the ATN.modeToDFA array being empty.
 

Despite scouring the web I can’t find a simple ANTLR4 C# target example that is complete and workable. Any assistance would be much appreciated!

 

Nilo Roberto C Paim

unread,
Aug 14, 2013, 3:49:44 PM8/14/13
to antlr-di...@googlegroups.com

Hi all,

 

I’m using last released C# target under VS 2008. It works like a charm.

 

After parsing, I’m using the following code to verify if I had errors:

 

if (parser.NumberOfSyntaxErrors > 0)

            Console.WriteLine("Errors found.");

       else

            Console.WriteLine("No errors found!");

 

This piece of code will test only parser errors, but I need to verify if I had lexical errors too.

 

How can I do this?

 

TIA.

 

Nilo - Brazil

phurst

unread,
Aug 14, 2013, 5:28:23 PM8/14/13
to antlr-di...@googlegroups.com
Nilo,
 
Can you tell me if the ANTLR tool generated a HelloLexer.cs file when you ran it?
If so, can you post the code of the  HelloLexer.cs file here?
 
Thanks,
 
phurst

 

Nilo Roberto C Paim

unread,
Aug 14, 2013, 5:54:02 PM8/14/13
to antlr-di...@googlegroups.com

Phurst,

 

Not exactly HelloLexer.cs, ‘cause the Grammar I’m testing is called Combined1. So, ANTLR tool generates a Combined1Lexer.cs file. Code follows.

 

// Generated from Combined1.g4 by ANTLR 4.0.1-SNAPSHOT

namespace Test1 {

using Antlr4.Runtime;

using Antlr4.Runtime.Atn;

using Antlr4.Runtime.Misc;

using DFA = Antlr4.Runtime.Dfa.DFA;

 

public partial class Combined1Lexer : Lexer {

                public const int

                               T__0=1, WS=2, ID=3;

                public static string[] modeNames = {

                               "DEFAULT_MODE"

                };

 

                public static readonly string[] tokenNames = {

                               "<INVALID>",

                               "';'", "' '", "ID"

                };

                public static readonly string[] ruleNames = {

                               "T__0", "WS", "ID"

                };

 

 

                               protected const int EOF = Eof;

                               protected const int HIDDEN = Hidden;

 

 

                public Combined1Lexer(ICharStream input)

                               : base(input)

                {

                               _interp = new LexerATNSimulator(this,_ATN);

                }

 

                public override string GrammarFileName { get { return "Combined1.g4"; } }

 

                public override string[] TokenNames { get { return tokenNames; } }

 

                public override string[] RuleNames { get { return ruleNames; } }

 

                public override string[] ModeNames { get { return modeNames; } }

 

                public override void Action(RuleContext _localctx, int ruleIndex, int actionIndex) {

                               switch (ruleIndex) {

                               case 1 : WS_action(_localctx, actionIndex); break;

                               }

                }

                private void WS_action(RuleContext _localctx, int actionIndex) {

                               switch (actionIndex) {

                               case 0: _channel = HIDDEN;  break;

                               }

                }

 

                public static readonly string _serializedATN =

                               "\x5\x4\x5\x14\b\x1\x4\x2\t\x2\x4\x3\t\x3\x4\x4\t\x4\x3\x2\x3\x2\x3\x3"+

                               "\x3\x3\x3\x3\x3\x3\x3\x4\x6\x4\x11\n\x4\r\x4\xE\x4\x12\x2\x2\x2\x5\x3"+

                               "\x2\x3\x1\x5\x2\x4\x2\a\x2\x5\x1\x3\x2\x3\x3\x63|\x14\x2\x3\x3\x2\x2\x2"+

                               "\x2\x5\x3\x2\x2\x2\x2\a\x3\x2\x2\x2\x3\t\x3\x2\x2\x2\x5\v\x3\x2\x2\x2"+

                               "\a\x10\x3\x2\x2\x2\t\n\a=\x2\x2\n\x4\x3\x2\x2\x2\v\f\a\"\x2\x2\f\r\x3"+

                               "\x2\x2\x2\r\xE\b\x3\x2\x2\xE\x6\x3\x2\x2\x2\xF\x11\t\x2\x2\x2\x10\xF\x3"+

                               "\x2\x2\x2\x11\x12\x3\x2\x2\x2\x12\x10\x3\x2\x2\x2\x12\x13\x3\x2\x2\x2"+

                               "\x13\b\x3\x2\x2\x2\x4\x2\x12";

                public static readonly ATN _ATN =

                               ATNSimulator.Deserialize(_serializedATN.ToCharArray());

}

} // namespace Test1

 

For completeness, here is the .g4 file that generates it:

 

grammar Combined1;

 

@parser::members

{

       protected const int EOF = Eof;

}

 

@lexer::members

{

       protected const int EOF = Eof;

       protected const int HIDDEN = Hidden;

}

 

// ==================================================

// Parser Rules

// ==================================================

 

start:

       command+ EOF ;

 

command:

       (ID)+ ';' ;

 

// ==================================================

// Lexer Rules

// ==================================================

 

WS           :      ' ' -> channel(HIDDEN)     ;

 

ID           :      [a-z]+ ;

 

 

 

 

Note that’s a very simplistic grammar. I’m just starting my tests with the C# target using VS 2008.

 

Hope that helps.

 

Regards,

Nilo

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sam Harwell

unread,
Aug 14, 2013, 6:15:42 PM8/14/13
to antlr-di...@googlegroups.com

Hi,

 

You’ll need to integrate grammar generation into the build process before it will work. The C# target is designed for rock-solid reliable use in the build tools, not for manual use on a command line.

 

Since you are using .NET 4+, you can use NuGet to automatically download the tools, runtime, and even configure your project in one step. You need to configure NuGet to search for prerelease packages, and install ANTLR 4 version 4.1.0-alpha002 (or whatever the latest it reports).

 

Also, you do not need to specify the language option in your grammar. The build tools will override the value you specify there anyway.

 

Thank you,

Sam Harwell

--

osca...@gmail.com

unread,
Aug 15, 2013, 3:58:11 AM8/15/13
to antlr-di...@googlegroups.com
Hello Sam

Does the phrase "The C# target is designed for rock-solid reliable use in the build tools, not for manual use on a command line." mean that there is no plan to make ANTLR4 usable from the command line to generate C# in a next future? It is possible to use C# in Linux, so it can be interesting to be able to use it out of VS.

Greetings,

Oscar

phurst

unread,
Aug 15, 2013, 10:20:45 AM8/15/13
to antlr-di...@googlegroups.com

I appreciate the helpful reply  Sam. “Rock Solid” --- I like the sound of that.

So I switched to attempting this in Visual Studio 2012. Piecing instructions together from various places (see below) I managed to create a grammar in VS and attempt a build. The build fails with the following error:

AC1000: Unknown build error: Could not locate a Java installation.

Looking at the AntlrClassGenerationTaskInternal.cs code it appears to be looking in the registry (not the JAVA_HOME environment variable) in the HKEY_LOCAL_MACHINE\SOFTWARE key, for a subkey JavaVendor\JavaInstallation. When I examine my registry I find a key:

HKEY_LOCAL_MACHINE\SOFTWARE\JavaSoft\Java Development Kit\1.7

I assume AntlrClassGenerationTaskInternal is looking for the wrong vendor or installation name.

Can I change the names it looks for by configuring the build?
If not, can you tell me what keys it’s looking for so I can hack the registry?

This is what I did:

Install Java (JDK 1.7) in C:\Program Files\Java\jdk1.7.0_17.

Set JAVA_HOME environment variable to the above dir.

Install ANTLR Language Support extension (a vsix file):

http://visualstudiogallery.msdn.microsoft.com/25b991db-befd-441b-b23b-bb5f8d07ee9f

Run VS2012.

Update NuGet to v 2.5 + (actually 2.6)

Create a VS Solution “HelloVS”

Install ANTLR 4 support:

In NuGet Official Package Sources, Include Prerelease, search for “ANTLR”.

Install the “ANTLR 4” package.

Add a grammar and edit it to be Hello.g4:

grammar Hello;

 

HELLOWORD : 'hello' ;

r  : HELLOWORD ID ;         // match keyword hello followed by an identifier

ID : [a-z]+ ;             // match lower-case identifiers

WS : [ \t\r\n]+ -> skip ;

Build:

1>------ Build started: Project: HelloVS, Configuration: Debug Any CPU ------

1>Build started 8/15/2013 9:32:46 AM.

1>E:\ANTLR\HelloVS\packages\Antlr4.4.1.0-alpha003\build\Antlr4.targets(132,5): error AC1000: Unknown build error: Could not locate a Java installation.

1>

1>Build FAILED.

 

 

phurst

unread,
Aug 15, 2013, 12:03:28 PM8/15/13
to antlr-di...@googlegroups.com

I spelunked trough the source code and figured out how the targets set the JavaVendor and JavaInstallation targets. I discovered that it is actually looking for “JavaSoft” and “Java Runtime Environment” respectively. I did not have the Java JRE installed. Having fixed that, the build now runs.

Two files are generated (below the Hello.g4 file in Solution Explorer):

Hello.g4.lexer.cs

Hello.g4.parser.cs

Both of these files are empty and I get an error in the build (see below). So I assume there is something wrong with my grammar.

Unknown build error: Executing command: “C:\Program Files (x86)\Java\jre7\bin\java.exe” –cp E:\ANTLR\HelloVS\packages\Antlr4.4.1.0-alpha003\build\..\tools\antlr4-csharp-4.1-SNAPSHOT-complete.jar org.antlr.v4.CsharpTool –o obj\Debug\ -listener –visitor –Dlanguage=Csharp_v4_5 –package HelloVS E:\ANTLR\HelloVS\HelloVS\Hello.g4

I decided to use a proven grammmar instead. I created a new VS project “JavaVS” and took the Java grammar from the Antlr4.Runtime.Test.v4.5 project in the Antlr4 source code. That project builds without errors and creates a lexer and parser file. But again both of these files are empty. Neither project generates a visitor class.

Any idea what do I need to do to get a populated parser and lexer?

 

phurst

unread,
Aug 15, 2013, 5:02:03 PM8/15/13
to antlr-di...@googlegroups.com
EUREKA!
 
It seems the lexer, parser, and listener are alll emitted into the out/Debug directory and get compiled. So my parser method now works and the Listener emits the following:
EnterEveryRule
EnterR
VisitTerminal hello
VisitTerminal world
ExitR
ExitEveryRule
Here's the parser method:
 

 private void RunParser() {
  AntlrInputStream inputStream = new AntlrInputStream("hello world\n");

  HelloLexer helloLexer = new HelloLexer(inputStream);


  CommonTokenStream commonTokenStream = new CommonTokenStream(helloLexer);
  HelloParser helloParser = new HelloParser(commonTokenStream);
  MyListener myListener = new MyListener();
  helloParser.AddParseListener(myListener);
  HelloParser.RContext rContext = helloParser.r();
 }

And here's the HelloListener class:

    public class MyListener : HelloBaseListener {
        public override void EnterEveryRule(Antlr4.Runtime.ParserRuleContext ctx) {
            Console.WriteLine("EnterEveryRule ");
        }
        public override void ExitEveryRule(Antlr4.Runtime.ParserRuleContext ctx) {
            Console.WriteLine("ExitEveryRule");
        }
        public override void VisitErrorNode(Antlr4.Runtime.Tree.IErrorNode node) {
            Console.WriteLine("VisitErrorNode");
        }
        public override void VisitTerminal(Antlr4.Runtime.Tree.ITerminalNode node) {
            Console.WriteLine("VisitTerminal {0}", node.Symbol.Text);
        }
        public override void EnterR(HelloParser.RContext context) {
            Console.WriteLine("EnterR");
        }
        public override void ExitR(HelloParser.RContext context) {
            Console.WriteLine("ExitR");
        }
    }       

So I think I'm in business now.

Thanks for your help!

 

Sam Harwell

unread,
Aug 15, 2013, 8:41:22 PM8/15/13
to antlr-di...@googlegroups.com

Hi Oscar,

 

It should be possible to update the build task to support systems running Mono. It may only require changing the piece of code that locates the Java executable on the current system (it uses the Registry now). Unfortunately I don’t have such a system right now so I haven’t been able to test it. However, I may be able to get ahold of one at work for experimentation, or run a virtual machine locally.

 

Thanks,

Sam

Sam Harwell

unread,
Aug 15, 2013, 8:45:21 PM8/15/13
to antlr-di...@googlegroups.com

Well, there’s the command line! :)

 

Those two files are created by the code template so it’s easy for you to add members to the classes without having to use an @members{} block in the grammar file. The actual generated code files are placed in the intermediate output directory (obj/Debug and obj/Release by default).

 

Thanks,

Sam

 

From: antlr-di...@googlegroups.com [mailto:antlr-di...@googlegroups.com] On Behalf Of phurst


Sent: Thursday, August 15, 2013 11:03 AM
To: antlr-di...@googlegroups.com

--

Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages