ANTLR4 not parsing floating point grammar

Michael Powell

unread,

Feb 7, 2019, 4:04:18 PM2/7/19

to antlr-discussion

Hello,

I'm not positive I see what's going on here or why it would not be parsing. It can properly identify Positive or Negative Infinity, even NaN, but it is having difficulty with actual numeric values. Instead it is finding "option" to fail parsing and instead seeing this as an Empty Statement.

Based on the Google Protocol Buffer grammar v2:

http://developers.google.com/protocol-buffers/docs/reference/proto2-spec#floating-point-literals

Here are a few snippets:

fragment DIG: [0-9];

fragment E: [Ee];

fragment SIGNAGE: [+-];

INFINITY: 'inf';

NOT_A_NUMBER: 'nan';

// It is worth establishing lexical comprehension for EXPONENT.

EXPONENT: E SIGNAGE? DIG+;

ONE_OR_MORE_DIG: DIG+;

ZERO_OR_MORE_DIG: DIG*;

sign: SIGN;

/*

floatLit = (

decimals "." [ decimals ] [ exponent ]

| decimals exponent

| "." decimals [ exponent ]

)

| "inf"

| "nan"

decimals = decimalDigit { decimalDigit }

exponent = ( "e" | "E" ) [ "+" | "-" ] decimals

// Also, technically the spec does not differentiate + from - Infinity, but we can here.

*/

infinity: INFINITY;

nan: NOT_A_NUMBER;

//floatDigitsDotDigitsExponent: ONE_OR_MORE_DIG DOT ZERO_OR_MORE_DIG EXPONENT?;

floatDigitsDotDigitsExponent: ONE_OR_MORE_DIG DOT ONE_OR_MORE_DIG? EXPONENT?;

floatDigitsExponent: ONE_OR_MORE_DIG EXPONENT;

floatDotDigitsExponent: DOT ONE_OR_MORE_DIG EXPONENT?;

floatLit : sign? (

infinity

| nan

| floatDigitsDotDigitsExponent

| floatDigitsExponent

| floatDotDigitsExponent

)

;

/*

strLit | boolLit

*/

constant

: booleanLit

| quotedStrLit

| floatLit

| intLit

| fullIdentLit

;

/*

option = "option" optionName "=" constant ";"

optionName = ( ident | "(" fullIdent ")" ) { "." ident }

*/

optionDecl: OPTION optionName EQU constant EOS;

Note that FullIdent, Integer, String, Boolean, those are all working. Floating Point is failing and instead reporting that as "Integer", skipping the rest of the statement, and then discovering an "empty statement".

Any suggestions how to make Floating Point grammar work?

Thanks!

Best regards,

Michael Powell

unread,

Feb 7, 2019, 4:36:20 PM2/7/19

to antlr-discussion

I am reviewing a couple of other ANTLR4 examples I managed to drudge up on my Duck search, and thinking that these "parser" rules might better serve as "lexer" rules instead. Still curious for feedback, however.

Loring Craymer

unread,

Feb 7, 2019, 5:25:02 PM2/7/19

to antlr-discussion

I do not think that grammar means what you think it means. '{' '}' encloses actions; '[' ']' encloses argument lists.

--Loring

Michael Powell

unread,

Feb 7, 2019, 5:53:56 PM2/7/19

to antlr-discussion

On Thursday, February 7, 2019 at 5:25:02 PM UTC-5, Loring Craymer wrote:

I do not think that grammar means what you think it means. '{' '}' encloses actions; '[' ']' encloses argument lists.

Have you actually read the spec? From the spec itself:

|   alternation
()  grouping
[]  option (zero or one time)
{}  repetition (any number of times)

i.e. in ANTLR terms, |, (), +, *.

No?

Loring Craymer

unread,

Feb 7, 2019, 8:49:50 PM2/7/19

to antlr-discussion

The EBNF for the language specification is not ANTLR EBNF syntax. Your grammar needs to use ANTLR EBNF, and you need to do a little translation from one to the other.

--Loring

Loring Craymer

unread,

Feb 7, 2019, 8:55:17 PM2/7/19

to antlr-discussion

or more specifically, look at "decimals" and "floatLit".

--Loring

Michael Powell

unread,

Feb 7, 2019, 8:55:44 PM2/7/19

to antlr-discussion

On Thursday, February 7, 2019 at 8:49:50 PM UTC-5, Loring Craymer wrote:

The EBNF for the language specification is not ANTLR EBNF syntax. Your grammar needs to use ANTLR EBNF, and you need to do a little translation from one to the other.

Yes, I understand that; and several other relaxed or nuance areas. I am well beyond that point.

I ended up refactoring some of the parser elements to lexer elements and now it seems the parser itself is better behaved. Now the question is what I can do at a target language level, C#, to help render my floating points in order to test that parser. However, that is beyond the scope of ANTLR itself.

Thanks for the feedback.

Loring Craymer

unread,

Feb 7, 2019, 9:13:41 PM2/7/19

to antlr-discussion

Actually, some of that "refactoring" probably wasn't. ANTLR uses capital letters to begin tokens and lower case letters to begin non-terminals, which reverses the conventional use. Both "decimals" and "floatLit" probably appear in the original spec as lever rules.

--Loring

Michael Powell

unread,

Feb 8, 2019, 10:59:20 AM2/8/19

to antlr-discussion

On Thursday, February 7, 2019 at 9:13:41 PM UTC-5, Loring Craymer wrote:

Actually, some of that "refactoring" probably wasn't. ANTLR uses capital letters to begin tokens and lower case letters to begin non-terminals, which reverses the conventional use. Both "decimals" and "floatLit" probably appear in the original spec as lever rules.

Lever rules? Assuming you meant lexer rules, yes, I realized that and it seems to be much better behaved.

Reply all

Reply to author

Forward