GRAMMAR to find any word separated by whitespace

André de Mattos Ferraz

unread,

Apr 10, 2015, 9:04:05 AM4/10/15

to antlr-di...@googlegroups.com

Hi Guys,

I want to parse the follow file:

/Projeto 0111_BM_S_11 – Bacia de Santos
/Arquivo de dados de Medidos e Processados “0111_BM_S_11_med_proc.asc”
/ Correção Densidade Bouguer = 1.8 ; Meridiano Central = 39ºW ; Altitude = 0 (nível do mar)
/Datum geog. SAD69 ; Datum gravimétrico ISGN ; Gravímetro L&R S246 ; Precisão 0.987/mGal
/Redução dados magnéticos IGRF2000 ; Valores dummies = “*”
/Correção da Variação Diurna: dados fornecidos pelo Observatório Nacional – Vassouras,RJ
/Filtragens e nivelamentos veja “0111_BM_S_11.pdf”
/Line = linha ; Fid = Fiducial ; Date = Data ; Time= Tempo ; Long = Longitude ; Lat=Latitude ;
/utme = UTM Leste ; utmn=UTM Norte ; Batim=batimetria; Batimniv = batimetria nivelada; Magbto = mag
/ bruto; magvd = variação diurna magnética ; igrf=IGRF 2000 ; maganom=mag anômalo; magniv = mag
/ nivelado; gravbta = grav bruta ; eotvoscorr = correção de Eotvos ; latcor = correção de latitude ;
/ freeair = Anomalia FreeAir ; fafil – FreeAir filtrado ; fafilniv – FreeAir Filtrado e Nivelado ;
/ Bougcorr=valor da Correção Bouguer ; Bouganom=Bouguer anômalo ; Bouganomniv=Bouguer Anôm. Nivelado.
Line,Fid,date,time,long,lat,utme,utmn,batim,batimniv,magbto,magvd,igrf,maganom,magniv,gravbta,eotvoscor,etvoscorfil,latcor,freeair,fafil,fafilniv,bougcorr,bouganom,bouganomniv
230,1,20040104,081230.321,-44.1903610,-23.9998722,582351.31,7345551.00,*,*,24270.3,-23.0,24200.0,47.3,47.5,9994.6,900.548,9094.052,9000.500,93.552,93.450,93.452,10.0,83.45,85.00
230,2,20040104,081231.402,-44.1924112,-24.0298600,582501.03,7345280.00,*,*,24268.0,-23.8,24200.0,44.2,44.5,9993.6,900.548,9093.052,9000.500,93.552,93.450, 93.452,10.0,83.45,85.00

Rules:

lines that began with '/' belongs to HEADER
The line with yellow marker are TITLE's and ALL titles must be defined in HEADER (orange part, not necessary in the same order, and header could be in different order). Just one title line can be exists.
VALUES (blue marker) lines are values for titles so if I have 10 TITLE's I need 10 VALUES
DUMMIES Values (pink mark) are specified in the header (at any position) and can be used in VALUES line
Multiple VALUES line can exist

My grammar:

dontCareHeaderLine
 : HEADER_INIT_CHAR (valorDummie | anything)* NEWLINE
 ;


valorDummie
 : 'Valores dummies' '=' '"' WORD '"'
 ;


anything    
 : (WORDS | WORD | '=' | ';' | ',' | '&')+
 ;


titleLine
 : title (COMMA title)* NEWLINE
 ;


title
 : WORD
 ;


titleValuesLine
 : titleValue (COMMA titleValue)* NEWLINE
 ;


titleValue
 : WORD
 | number
 ;


WORDS
    :   WORD+
    ;


WORD
    : (~('\n'|'\r'|'/'|'='|';'|','))+
    ;


number
    : int | float
    ;


int
    : DIGIT+
    ;


float
    : DIGIT+ '.' DIGIT* EXPOENTPART?
    | '.' DIGIT+ EXPOENTPART?
    | DIGIT+ EXPOENTPART
    ;


HEADER_INIT_CHAR
 : '/'
 ;


NEWLINE 
 : '\r'? '\n'
 ;


WS  
    : [ \t\u000C\r\n]+ -> skip
    ;

When I test my grammar, she can't recognize a simple WORD... So anyone can help me?

Thx in advance

Test.g4

TestGrammarUnitTest.cs

ErrorListener.cs

André de Mattos Ferraz

unread,

Apr 10, 2015, 11:11:27 AM4/10/15

to antlr-di...@googlegroups.com

I tried to change the rule of WORD to:

WORD

//: [a-zA-Z0-9]+

//: ( ~('\n'|'\r'|' '|'\t') )+?

: (~('\r'|'\n'|' '))+

;

But the problem persists...

Eric Vergnaud

unread,

Apr 10, 2015, 11:47:09 AM4/10/15

to antlr-di...@googlegroups.com

Hi,

your grammar is far from being complete

it's missing a number of lexer tokens (COMMA, DIGIT...), so you should get warnings when generating the code

you also have colliding token definitions: NEWLINE and WS

which rule do you call for parsing a file? all the defined rules seem to care about 1 line only

Eric

André de Mattos Ferraz

unread,

Apr 10, 2015, 12:35:28 PM4/10/15

to antlr-di...@googlegroups.com

Hi Eric,

If you open the file TestGrammarUnitTest.cs you will se the follow:

[TestMethod] 
public void Word() 
{ 
    TestParser parser = createParser("zica"); 
    parser.title(); 
}

By the way thx for the tips.

André de Mattos Ferraz

unread,

Apr 13, 2015, 7:17:17 AM4/13/15

to antlr-di...@googlegroups.com

I change my GRAMMAR, using some tips provided by Eric but my problem persists:



dontCareHeaderLine
 : HEADER_INIT_CHAR (valorDummie | anything)* NEWLINE
 ;




valorDummie
 : 'Valores dummies' '=' '"' WORD '"'
 ;






anything    
 : (WORDS | WORD | '=' | ';' | ',' | '&' | '/' | '*' | '+' | '-')+


 ;




titleLine
 : title (COMMA title)* NEWLINE
 ;




title
 : WORD
 ;




titleValuesLine
 : titleValue (COMMA titleValue)* NEWLINE
 ;




titleValue
 : WORD
 | number
 ;




number
    : int | float


    ;




int
    : DIGIT+
    ;




float
    : DIGIT+ '.' DIGIT* EXPOENTPART?
    | '.' DIGIT+ EXPOENTPART?
    | DIGIT+ EXPOENTPART
    ;

/*
 * Lexer Rules
 */


WORDS
    :   WORD+
    ;




WORD
 //: [a-zA-Z0-9]+
    //: ( ~('\n'|'\r'|' '|'\t') )+?


 : (~('\r'|'\n'|' '|'\t'))+ 
    ;


DIGIT
    : [0-9]
    ;


EXPOENTPART
    : ['eE'] [+-]? DIGIT+


    ;


HEADER_INIT_CHAR
 : '/'
 ;


NEWLINE 
 : '\r'? '\n'
 ;




WS  
    : [\u000C]+ -> skip
    ;

Eric Vergnaud

unread,

Apr 13, 2015, 9:26:28 AM4/13/15

to antlr-di...@googlegroups.com

How did you come to the conclusion that the parser does not recognize WORD? Any error?

André de Mattos Ferraz

unread,

Apr 13, 2015, 11:21:08 AM4/13/15

to antlr-di...@googlegroups.com

Test Name: Word

Test FullName: PowerQC.PMUnitTest.TestGrammarUnitTest.Word

Test Source: c:\Users\h162524\Documents\Visual Studio 2013\Projects\PowerQC.PotentialMehodsClient\PowerQC.PMUnitTest\TestGrammarUnitTest.cs : line 15

Test Outcome: Failed

Test Duration: 0:00:00,0706606

Result Message:

Test method PowerQC.PMUnitTest.TestGrammarUnitTest.Word threw exception:

System.Exception: <unknown>:1:0: token zica: line 1:0 mismatched input 'zica' expecting WORD

Details:Exception of type 'Antlr4.Runtime.InputMismatchException' was thrown.

Result StackTrace:

at PowerQC.Grammars.ErrorListener.SyntaxError(IRecognizer recognizer, IToken offendingSymbol, Int32 line, Int32 charPositionInLine, String msg, RecognitionException e) in c:\Users\h162524\Documents\Visual Studio 2013\Projects\PowerQC.PotentialMehodsClient\PowerQC.Grammars\ErrorListener.cs:line 28

at Antlr4.Runtime.ProxyErrorListener`1.SyntaxError(IRecognizer recognizer, Symbol offendingSymbol, Int32 line, Int32 charPositionInLine, String msg, RecognitionException e)

at Antlr4.Runtime.Parser.NotifyErrorListeners(IToken offendingToken, String msg, RecognitionException e)

at Antlr4.Runtime.DefaultErrorStrategy.NotifyErrorListeners(Parser recognizer, String message, RecognitionException e)

at Antlr4.Runtime.DefaultErrorStrategy.ReportInputMismatch(Parser recognizer, InputMismatchException e)

at Antlr4.Runtime.DefaultErrorStrategy.ReportError(Parser recognizer, RecognitionException e)

at PowerQC.Grammars.Test.TestParser.title() in c:\Users\h162524\Documents\Visual Studio 2013\Projects\PowerQC.PotentialMehodsClient\PowerQC.Grammars\obj\Debug\TestParser.cs:line 462

at PowerQC.PMUnitTest.TestGrammarUnitTest.Word() in c:\Users\h162524\Documents\Visual Studio 2013\Projects\PowerQC.PotentialMehodsClient\PowerQC.PMUnitTest\TestGrammarUnitTest.cs:line 17

Eric Vergnaud

unread,

Apr 13, 2015, 8:00:19 PM4/13/15

to antlr-di...@googlegroups.com

Can you switch back your definition of WORD to the original one?

André de Mattos Ferraz

unread,

Apr 14, 2015, 8:28:05 AM4/14/15

to antlr-di...@googlegroups.com

Hi Eric,

Same error:

Test Name: Word

Test FullName: Landmark.PowerQC.PMUnitTest.TestGrammarUnitTest.Word

Test Source: c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.PMUnitTest\TestGrammarUnitTest.cs : line 15

Test Outcome: Failed

Test Duration: 0:00:00,0816688

Result Message:

Test method Landmark.PowerQC.PMUnitTest.TestGrammarUnitTest.Word threw exception:

System.Exception: <unknown>:1:0: token zica: line 1:0 mismatched input 'zica' expecting WORD

Details:Exception of type 'Antlr4.Runtime.InputMismatchException' was thrown.

Result StackTrace:

at Landmark.PowerQC.Grammars.ErrorListener.SyntaxError(IRecognizer recognizer, IToken offendingSymbol, Int32 line, Int32 charPositionInLine, String msg, RecognitionException e) in c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.Grammars\ErrorListener.cs:line 28

at Antlr4.Runtime.ProxyErrorListener`1.SyntaxError(IRecognizer recognizer, Symbol offendingSymbol, Int32 line, Int32 charPositionInLine, String msg, RecognitionException e)

at Antlr4.Runtime.Parser.NotifyErrorListeners(IToken offendingToken, String msg, RecognitionException e)

at Antlr4.Runtime.DefaultErrorStrategy.NotifyErrorListeners(Parser recognizer, String message, RecognitionException e)

at Antlr4.Runtime.DefaultErrorStrategy.ReportInputMismatch(Parser recognizer, InputMismatchException e)

at Antlr4.Runtime.DefaultErrorStrategy.ReportError(Parser recognizer, RecognitionException e)

at Landmark.PowerQC.Grammars.Test.TestParser.title() in c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.Grammars\obj\Debug\TestParser.cs:line 462

at Landmark.PowerQC.PMUnitTest.TestGrammarUnitTest.Word() in c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.PMUnitTest\TestGrammarUnitTest.cs:line 17

Eric Vergnaud

unread,

Apr 15, 2015, 12:11:53 PM4/15/15

to antlr-di...@googlegroups.com

according to your screen shot you haven't switched back to 'a'..'z' | 'A'..'Z'

André de Mattos Ferraz

unread,

Apr 15, 2015, 1:18:13 PM4/15/15

to antlr-di...@googlegroups.com

Same error Eric:

Test Name: Word

Test FullName: Landmark.PowerQC.PMUnitTest.TestGrammarUnitTest.Word

Test Source: c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.PMUnitTest\TestGrammarUnitTest.cs : line 15

Test Outcome: Failed

Test Duration: 0:00:00,0871433

Eric Vergnaud

unread,

Apr 16, 2015, 12:59:53 PM4/16/15

to antlr-di...@googlegroups.com

Hi,

not sure what is going on here. I can assure you this fragment normally works.

I notice you're using the Nuget version of antlr, so I can't reproduce the issue.

My advice would be to start with the simplest grammar, i.e. WORD + WS, and gradually expand it to locate the cause.