How to properly access literals in C .g4 grammar

280 views
Skip to first unread message

firyice

unread,
Nov 13, 2014, 9:19:21 PM11/13/14
to
I am using the C.g4 grammar from the antlr/grammars-v4 Github repo to parse C files, but some of the rules use string literals and I don't know how to properly access them in my visitors.  

Example Rule:
additiveExpression
    :   multiplicativeExpression
    |   additiveExpression '+' multiplicativeExpression
    |   additiveExpression '-' multiplicativeExpression
    ;

So in order to distinguish between addition and subtraction i need to check for the '+' or '-' literal.  I can do:

Token tok = (Token) ctx.getChild(1).getPayload();

if (tok.getType() == Parser.Plus) {...}

// or

if (ctx.getChild(1).getText().equals('+')) {...}


This doesn't seem right, am I doing something wrong? Most examples advise against using getChild()... is there a better way?

Or is this an error in the C grammar? These tokens are defined later in the same file...
Plus : '+';
Minus : '-';

Should the above literals be replaced with these? Like...
additiveExpression
    :   multiplicativeExpression
    |   additiveExpression Plus multiplicativeExpression
    |   additiveExpression Minus multiplicativeExpression
    ;

David Whitten

unread,
Nov 13, 2014, 10:05:56 PM11/13/14
to antlr-di...@googlegroups.com
I seem to recall that the uppercase left-hand-sides are Lexer rules
and the lowercase ones are grammar rules.

On Thu, Nov 13, 2014 at 9:19 PM, firyice <steve...@gmail.com> wrote:
> I am using the C.g4 grammar from the antlr/grammars-v4 Github repo to parse
> C files, but some of the rules use string literals and I don't know how to
> properly access them in my visitors.
>
> Example Rule:
> additiveExpression
> : multiplicativeExpression
> | additiveExpression '+' multiplicativeExpression
> | additiveExpression '-' multiplicativeExpression
> ;
>
> So in order to distinguish between addition and subtraction i need to check
> for the '+' or '-' literal. I can do:
>
> Token tok = (Token) ctx.getChild(1).getPayload();
>
> if (tok.getType() == Parser.Plus) {...}
>
> // or
>
> if (ctx.getChild(1).getText().equals('+')) {...}
>
>
> This doesn't seem right, am I doing something wrong? Most examples advise
> against using getChild()...is there a better way?
>
> Or is this an error in the C grammar? These tokens are defined later in the
> same file...
> Plus : '+';
> Minus : '-';
>
> Should the above literals be replaced with these?
>
> --
> You received this message because you are subscribed to the Google Groups
> "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to antlr-discussi...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jim Idle

unread,
Nov 13, 2014, 11:18:56 PM11/13/14
to antlr-di...@googlegroups.com
On Fri, Nov 14, 2014 at 10:19 AM, firyice <steve...@gmail.com> wrote:
I am using the C.g4 grammar from the antlr/grammars-v4 Github repo to parse C files, but some of the rules use string literals and I don't know how to properly access them in my visitors.  

Example Rule:
additiveExpression
    :   multiplicativeExpression
    |   additiveExpression '+' multiplicativeExpression
    |   additiveExpression '-' multiplicativeExpression
    ;

So in order to distinguish between addition and subtraction i need to check for the '+' or '-' literal.  I can do:

Token tok = (Token) ctx.getChild(1).getPayload();

if (tok.getType() == Parser.Plus) {...}

// or

if (ctx.getChild(1).getText().equals('+')) {...}


This doesn't seem right, am I doing something wrong? Most examples advise against using getChild()...is there a better way?

Or is this an error in the C grammar? These tokens are defined later in the same file...
Plus : '+';
Minus : '-';

Should the above literals be replaced with these?

Yes - this is a bug in the grammar. I always advise against using literals in parser rules as it is easily to confuse yourself. They are OK in tiny hacked up grammars.

Try this:

additiveExpression
    :   multiplicativeExpression
    |   additiveExpression op=(Plus|minus) multiplicativeExpression
    ;

Then the op parameter will be available in the context:

    @Override
    public void enterAdditiveExperession ctx)  /* or Exit of course */ {

        switch(ctx.op.getType) {

          case YourParser.Plus:

.... etc
        }

    }


Jim
 

Jim Idle

unread,
Nov 13, 2014, 11:20:16 PM11/13/14
to antlr-di...@googlegroups.com
On Fri, Nov 14, 2014 at 12:18 PM, Jim Idle <ji...@temporal-wave.com> wrote:
On Fri, Nov 14, 2014 at 10:19 AM, firyice <steve...@gmail.com> wrote:

additiveExpression
    :   multiplicativeExpression
    |   additiveExpression op=(Plus|Minus) multiplicativeExpression
    ;

Type correction: minus -> Minus
 

firyice

unread,
Nov 15, 2014, 3:35:24 PM11/15/14
to antlr-di...@googlegroups.com
Jim, 

Thanks for the help!  

If that's how it supposed to be done, I'm a bit surprised that the C.g4 grammar posted under the ANTLR Github is so poor.  I looked through some of the other grammars (java, java8 in particular) and they have string literals too.

ANTLR seems to be a pretty mature project, are there no well-written C grammars out there?  I'm trying to put together something of a minimal C interpreter, and have had awful luck finding good examples.

Terence Parr

unread,
Nov 15, 2014, 3:36:23 PM11/15/14
to antlr-di...@googlegroups.com

/** C 2011 grammar built from the C11 Spec */
grammar C;

:)

Ter

firyice

unread,
Nov 15, 2014, 3:40:25 PM11/15/14
to antlr-di...@googlegroups.com
Terence,

Would you recommend going through that C.g4 grammar and replacing string literals with named tokens as Jim suggests? Or is the grammar correct, and should be left as is, and accessed with something like...

Token tok = (Token) ctx.getChild(1).getPayload();

if (tok.getType() == Parser.Plus) {...}

// or

if (ctx.getChild(1).getText().equals('+')) {...}


Terence Parr

unread,
Nov 15, 2014, 3:45:07 PM11/15/14
to antlr-di...@googlegroups.com
Nope I do not recommend separating in general. I specifically combined lexical and parsing rules in the first ANTLR in response to having to have separate ones for Lex and yacc in 1989 :)

Ter

firyice

unread,
Nov 15, 2014, 3:55:02 PM11/15/14
to antlr-di...@googlegroups.com
Terence,

Thank you so much for your help!
 
I'm very new to this and I think I'm still missing something. I don't fully understanding what you mean by not recommending separating. Do you mean to keep the C.g4 as is, or changing it?  If I go by Jim's suggestion, it seems to make the visitor/listener code much simpler.

Also, since you're the writer of ANTLR, and the language guru in general, any specific examples you could point to that would help in writing a simple C interpreter?

Terence Parr

unread,
Nov 15, 2014, 4:29:20 PM11/15/14
to antlr-di...@googlegroups.com
I’d keep C.g4 as it is but it depends on what i’m doing. if I need to refer to literals like ‘if’ for some reason, you need a label like

IF : ‘if’;

but I still keep in same file.

re: interp: see

https://www.youtube.com/watch?v=OjaAToVkoTw


Ter

firyice

unread,
Nov 15, 2014, 4:56:42 PM11/15/14
to
Terence,

Thank you so much for the link to your talk.  Watching it now!

As far as my original question, I guess I’m just trying to figure out the best practice…

I don't need to refer to literals like '+' and '-' in...
additiveExpression
    :   multiplicativeExpression
    |   additiveExpression '+' multiplicativeExpression
    |   additiveExpression '-' multiplicativeExpression
    ;

but I need to know which literal was there so I can add or subtract accordingly. If i keep the grammar the same, it seems like this is somewhat messy, unless I'm misunderstanding how to use these methods:
Token tok = (Token) ctx.getChild(1).getPayload();
if (tok.getType() == Parser.Plus) {...}
// or
if (ctx.getChild(1).getText().equals('+')) {...}

Further down in the same grammar I see:
Plus : '+';
Minus : '-';

If I change the current grammar to:
additiveExpression
    :   multiplicativeExpression
    |   additiveExpression Plus multiplicativeExpression
    |   additiveExpression Minus multiplicativeExpression
    ;

It seems like I can just do:
if (ctx.Plus() != null) {...}

Or I can change the grammar to:
additiveExpression
    :   multiplicativeExpression
    |   additiveExpression op=(Plus|Minus) multiplicativeExpression
    ;

And in my visitor:
switch(ctx.op.getType()) {
    case Parser.Plus:
    // .... etc

Terence Parr

unread,
Nov 15, 2014, 4:57:50 PM11/15/14
to antlr-di...@googlegroups.com
Yup

> Plus : '+';
> Minus : '-‘;

is good and can leave in same single-file grammar.
T
On Nov 15, 2014, at 1:56 PM, firyice <steve...@gmail.com> wrote:

> Terence,
>
> Thank you so much for the link to your talk. Watching it now!
>
> As far as my original question, I guess I’m just trying to figure out the best practice…
>
> I don't need to refer to literals like '+' and '-' in...
> additiveExpression
> : multiplicativeExpression
> | additiveExpression '+' multiplicativeExpression
> | additiveExpression '-' multiplicativeExpression
> ;
>
> but I need to know which literal was there so I can add or subtract accordingly. If i keep the grammar the same, it seems like this is somewhat messy, unless I'm misunderstanding how to use these methods:
> Token tok = (Token) ctx.getChild(1).getPayload();
> if (tok.getType() == Parser.Plus) {...}
> // or
> if (ctx.getChild(1).getText().equals('+')) {...}
>
> Further down in the same grammar I see:
> Plus : '+';
> Minus : '-';
>
> If I change the current grammar to:
> additiveExpression
> : multiplicativeExpression
> | additiveExpression Plus multiplicativeExpression
> | additiveExpression Minus multiplicativeExpression
> ;
>
> It seems like I can just do:
> if (ctx.Plus() != null) {...}
>
> Or I can change the grammar to:
> additiveExpression
> : multiplicativeExpression
> | additiveExpression op=(Plus|Minus) multiplicativeExpression
> ;
>
> And in my visitor:
> switch(ctx.op.getType()) {
> case Parser.Plus:
> // .... etc
> }
>
> On Saturday, November 15, 2014 4:29:20 PM UTC-5, the_antlr_guy wrote:

firyice

unread,
Nov 15, 2014, 5:06:28 PM11/15/14
to antlr-di...@googlegroups.com
So with 

>> Plus : '+'; 
>> Minus : '-‘; 

Which of the 2 below is better practice?  I guess it's just preference?
additiveExpression Plus multiplicativeExpression
additiveExpression Minus multiplicativeExpression

additiveExpression op=(Plus|Minus) multiplicativeExpression

Thanks again!

Terence Parr

unread,
Nov 15, 2014, 5:07:39 PM11/15/14
to antlr-di...@googlegroups.com
that’s a style question. with lots of operators, i’d likely do (A|B|C).
T

firyice

unread,
Nov 15, 2014, 5:09:58 PM11/15/14
to antlr-di...@googlegroups.com
Alright, I think I have this sorted out.  Thanks again, enjoying your talk!

Have a great evening!

Jim Idle

unread,
Nov 15, 2014, 9:40:00 PM11/15/14
to antlr-di...@googlegroups.com
It's not necessary to replace 'literals' but when you want them as tokens in your parser listener code it can make some things easier. The grammar looks reasonable to be in terms of parsing. 

Jim

Reply all
Reply to author
Forward
0 new messages