Floating point rule not matching

13 views
Skip to first unread message

Michael Powell

unread,
Feb 1, 2019, 8:34:14 PM2/1/19
to antlr-discussion
Hello,

I've got a given string something like this:

"syntax = 'proto2';option l = 0.9;"

However, the lexer/parser are failing to match correctly:

"syntax='proto2';optionl=0<missing ';'>.9;"

The error appears to be reported here (not sure how I identify that at the moment via the parser elements, contexts, etc):

"syntax='proto2';optionl=0<missing ';'>.9;"

digits: DIG DIG*; // Could this instead be DIG+ ?
exponent: E SIGN? digits;

/*
floatLit = ( decimals "." [ decimals ] [ exponent ] | decimals exponent | "."decimals [ exponent ] ) | "inf" | "nan"
decimals  = decimalDigit { decimalDigit }
exponent  = ( "e" | "E" ) [ "+" | "-" ] decimals 
 Also, technically the spec does not differentiate + from - Infinity, but we can here.
 */
floatLit
  : INF
    | NAN
    | ( '.' digits exponent? )
    | ( digits '.' digits? exponent? )
    | ( digits exponent )
;

With the Constant rule eventually rolling all that up:

/*
constant = fullIdent | ( [ "-" | "+" ] intLit ) | ( [ "-" | "+" ] floatLit ) |
                strLit | boolLit 
*/
constant
  : boolLit
    | quotedStrLit
    | SIGN? floatLit
    | SIGN? intLit
    | fullIdent
;

And the lexer rules:

INF: 'inf';
NAN: 'nan';

SIGN: [+-];
E: [Ee];
DIG: [0-9];

Does the '.' (dot) really need a formal lexer rule?


Best regards,

Michael W Powell

Mike Lischke

unread,
Feb 2, 2019, 6:32:22 AM2/2/19
to antlr-discussion

I've got a given string something like this:

"syntax = 'proto2';option l = 0.9;"

However, the lexer/parser are failing to match correctly:

"syntax='proto2';optionl=0<missing ';'>.9;"

The error appears to be reported here (not sure how I identify that at the moment via the parser elements, contexts, etc):

"syntax='proto2';optionl=0<missing ';'>.9;"

Whenever you get a recognition error first start your search in the lexer/token stream. Print all the tokens that were lexed to see if they match what you expected. Very often the source of unexpected parser errors are actually wrong tokens.


digits: DIG DIG*; // Could this instead be DIG+ ?

Yes.



Does the '.' (dot) really need a formal lexer rule?

I strongly recommend to give all literals an own lexer rule. Not only does this allow ANTLR4 to generate speaking names for the access methods and constants in the generated lexer/parser, but also avoids occasional trouble where different tokens match the same input.


Is there a complete EBNF grammar for the protobuf language, instead of separate text snippets of individual elements? I'm thinking about adding an EBNF -> ANTLR4 grammar converter to my vscode extension.


Michael Powell

unread,
Feb 2, 2019, 7:56:35 PM2/2/19
to antlr-discussion
Only the link I provided here, that I know of. I'll warn you, however, the grammar is somewhat loose in places. Not so much so that you cannot puzzle your way through it, though. 

Reply all
Reply to author
Forward
0 new messages