Sorry, for the beginners questions:
I am trying to parse a file that looks like the following (Only the first few lines, for brevity):
0010/** DIALOG SOURCE 22D
0020/*[ DEFINE DIALOG INFO
0030**D* NATURAL Dialog Description 6.3.13.0 / 2013-08-14 13:38
0040/** EMPTY DIALOG COMMENT
0050/*] END-DIALOG-INFO
The language is, of course, a night mare. (Just to give an example, note the "/*[" in line 2, basically a comment introducer, followed by the obviously important words "DEFINE" "DIALOG" "INFO". I didn't make that language, I am attempting to parse it.)
After quite some thinking, I came to the conclusion that my only chance to handle this, would be the following approach:
1 Line = 1 Token
This may be questionable for most of you, but it got me started, so I am currently happy with that, and I got a manually written lexer/parser working within one week. Now I am at a point where the first grammar modifications are required. So, this is my second attempt to get things working with AntLR. (See my grammar below.)
However, I am getting the following error messages, and am unable to deal with them (In particular, the line=-1, charPosition=-1, is puzzling me):
[ERROR] Message{errorType=SYNTAX_ERROR, args=[mismatched character '\n' expecting ']'], e=MismatchedTokenException(10!=93), fileName='/home/jwi/workspace/ns3mod-parser-antlr/src/main/antlr4/ns3mod.g4', line=-1, charPosition=-1}
[ERROR] Message{errorType=SYNTAX_ERROR, args=[unterminated rule (missing ';') detected at 'COMMENTLINE1 :' while looking for lexer rule element], e=org.antlr.v4.parse.v4ParserException, fileName='/home/jwi/workspace/ns3mod-parser-antlr/src/main/antlr4/ns3mod.g4', line=14, charPosition=0}
[ERROR] Message{errorType=SYNTAX_ERROR, args=[unterminated rule (missing ';') detected at 'DIALOG_SOURCE :' while looking for lexer rule element], e=org.antlr.v4.parse.v4ParserException, fileName='/home/jwi/workspace/ns3mod-parser-antlr/src/main/antlr4/ns3mod.g4', line=19, charPosition=0}
[ERROR] Message{errorType=SYNTAX_ERROR, args=['"' came as a complete surprise to me], e=null, fileName='/home/jwi/workspace/ns3mod-parser-antlr/src/main/antlr4/ns3mod.g4', line=19, charPosition=23}
[ERROR] Message{errorType=SYNTAX_ERROR, args=['/**" "DIALOG" "SOURCE" "22D" EOL;\nDEFINE_DIALOG_INFO: LINENUM "/*[" "DEFINE" "DIALOG" "INFO";\nEND_DIALOG_INFO: LINENUM "/*]" "END-DIALOG-INFO" EOL;\nLF : '\u000A';\nCR : '\u000D';\nTEXT: ~[\\u000A\u000D];\n' came as a complete surprise to me while looking for lexer rule element], e=NoViableAltException(17@[]), fileName='/home/jwi/workspace/ns3mod-parser-antlr/src/main/antlr4/ns3mod.g4', line=19, charPosition=24}
grammar Ns3mod ;
dialog_source:
dialog_header ;
dialog_header :
DIALOG_source dialog_info? ;
dialog_info :
DEFINE_DIALOG_INFO END_DIALOG_INFO ;
WS : [ \t\]+ -> skip ;
COMMENTLINE1: LINENUM '**D*' TEXT* EOL ;
COMMENTLINE2: LINENUM '/*' TEXT* EOL ;
COMMENT: (COMMENTLINE1 | COMMENTLINE2) -> skip ;
LINENUM: ([0-9])+ ;
EOL: CR?LF
DIALOG_SOURCE: LINENUM "/**" "DIALOG" "SOURCE" "22D" EOL;
DEFINE_DIALOG_INFO: LINENUM "/*[" "DEFINE" "DIALOG" "INFO";
END_DIALOG_INFO: LINENUM "/*]" "END-DIALOG-INFO" EOL;
LF : '\u000A';
CR : '\u000D';
TEXT: ~[\\u000A\u000D];