Floating point rule not matching

Michael Powell

unread,

Feb 1, 2019, 8:34:14 PM2/1/19

to antlr-discussion

Hello,

I've got a given string something like this:

"syntax = 'proto2';option l = 0.9;"

However, the lexer/parser are failing to match correctly:

"syntax='proto2';optionl=0<missing ';'>.9;"

The error appears to be reported here (not sure how I identify that at the moment via the parser elements, contexts, etc):

"syntax='proto2';optionl=0<missing ';'>.9;"

digits: DIG DIG*; // Could this instead be DIG+ ?

exponent: E SIGN? digits;

/*

floatLit = ( decimals "." [ decimals ] [ exponent ] | decimals exponent | "."decimals [ exponent ] ) | "inf" | "nan"

decimals = decimalDigit { decimalDigit }

exponent = ( "e" | "E" ) [ "+" | "-" ] decimals

Also, technically the spec does not differentiate + from - Infinity, but we can here.

*/

floatLit

: INF

| NAN

| ( '.' digits exponent? )

| ( digits '.' digits? exponent? )

| ( digits exponent )

;

With the Constant rule eventually rolling all that up:

/*

strLit | boolLit

*/

constant

: boolLit

| quotedStrLit

| SIGN? floatLit

| SIGN? intLit

| fullIdent

;

And the lexer rules:

INF: 'inf';

NAN: 'nan';

SIGN: [+-];

E: [Ee];

DIG: [0-9];

Does the '.' (dot) really need a formal lexer rule?

https://developers.google.com/protocol-buffers/docs/reference/proto2-spec

Best regards,

Michael W Powell

Mike Lischke

unread,

Feb 2, 2019, 6:32:22 AM2/2/19

to antlr-discussion

I've got a given string something like this:

"syntax = 'proto2';option l = 0.9;"

However, the lexer/parser are failing to match correctly:

"syntax='proto2';optionl=0<missing ';'>.9;"

The error appears to be reported here (not sure how I identify that at the moment via the parser elements, contexts, etc):

"syntax='proto2';optionl=0<missing ';'>.9;"

Whenever you get a recognition error first start your search in the lexer/token stream. Print all the tokens that were lexed to see if they match what you expected. Very often the source of unexpected parser errors are actually wrong tokens.

digits: DIG DIG*; // Could this instead be DIG+ ?

Yes.

Does the '.' (dot) really need a formal lexer rule?

I strongly recommend to give all literals an own lexer rule. Not only does this allow ANTLR4 to generate speaking names for the access methods and constants in the generated lexer/parser, but also avoids occasional trouble where different tokens match the same input.

https://developers.google.com/protocol-buffers/docs/reference/proto2-spec

Is there a complete EBNF grammar for the protobuf language, instead of separate text snippets of individual elements? I'm thinking about adding an EBNF -> ANTLR4 grammar converter to my vscode extension.

Mike
--
www.soft-gems.net

Michael Powell

unread,

Feb 2, 2019, 7:56:35 PM2/2/19

to antlr-discussion

Only the link I provided here, that I know of. I'll warn you, however, the grammar is somewhat loose in places. Not so much so that you cannot puzzle your way through it, though.

Mike
--
www.soft-gems.net

Reply all

Reply to author

Forward