GRAMMAR to find any word separated by whitespace

69 views
Skip to first unread message

André de Mattos Ferraz

unread,
Apr 10, 2015, 9:04:05 AM4/10/15
to antlr-di...@googlegroups.com
Hi Guys,

I want to parse the follow file:

/Projeto 0111_BM_S_11 – Bacia de Santos
/
Arquivo de dados de Medidos e Processados 0111_BM_S_11_med_proc.asc
/ Correção Densidade Bouguer = 1.8 ; Meridiano Central = 39ºW ; Altitude = 0 (nível do mar)
/Datum geog. SAD69 ; Datum gravimétrico ISGN ; Gravímetro L&R S246 ; Precisão 0.987/mGal
/Redução dados magnéticos IGRF2000 ; Valores dummies = “*”
/Correção da Variação Diurna: dados fornecidos pelo Observatório Nacional Vassouras,RJ
/Filtragens e nivelamentos veja 0111_BM_S_11.pdf
/Line = linha ; Fid = Fiducial ; Date = Data ; Time= Tempo ; Long = Longitude ; Lat=Latitude ;
/utme = UTM Leste ; utmn=UTM Norte ; Batim=batimetria; Batimniv = batimetria nivelada; Magbto = mag
/
bruto; magvd = variação diurna magnética ; igrf=IGRF 2000 ; maganom=mag anômalo; magniv = mag
/ nivelado; gravbta = grav bruta ; eotvoscorr = correção de Eotvos ; latcor = correção de latitude ;
/ freeair = Anomalia FreeAir ; fafil – FreeAir filtrado ; fafilniv – FreeAir Filtrado e Nivelado ;
/
Bougcorr=valor da Correção Bouguer ; Bouganom=Bouguer anômalo ; Bouganomniv=Bouguer Anôm. Nivelado.

Line,Fid,date,time,long,lat,utme,utmn,batim,batimniv,magbto,magvd,igrf,maganom,magniv,gravbta,eotvoscor,etvoscorfil,latcor,freeair,fafil,fafilniv,bougcorr,bouganom,bouganomniv
230,1,20040104,081230.321,-44.1903610,-23.9998722,582351.31,7345551.00,*,*,24270.3,-23.0,24200.0,47.3,47.5,9994.6,900.548,9094.052,9000.500,93.552,93.450,93.452,10.0,83.45,85.00
230,2,20040104,081231.402,-44.1924112,-24.0298600,582501.03,7345280.00,*,*,24268.0,-23.8,24200.0,44.2,44.5,9993.6,900.548,9093.052,9000.500,93.552,93.450, 93.452,10.0,83.45,85.00

Rules:
  1. lines that began with '/' belongs to HEADER
  2. The line with yellow marker are TITLE's and ALL titles must be defined in HEADER (orange part, not necessary in the same order, and header could be in different order). Just one title line can be exists.
  3. VALUES (blue marker) lines are values for titles so if I have 10 TITLE's I need 10 VALUES
  4. DUMMIES Values (pink mark) are specified in the header (at any position) and can be used in VALUES line
  5. Multiple VALUES line can exist
My grammar:

dontCareHeaderLine
 
: HEADER_INIT_CHAR (valorDummie | anything)* NEWLINE
 
;


valorDummie
 
: 'Valores dummies' '=' '"' WORD '"'
 
;


anything    
 
: (WORDS | WORD | '=' | ';' | ',' | '&')+
 
;


titleLine
 
: title (COMMA title)* NEWLINE
 
;


title
 
: WORD
 
;


titleValuesLine
 
: titleValue (COMMA titleValue)* NEWLINE
 
;


titleValue
 
: WORD
 
| number
 
;


WORDS
   
:   WORD+
   
;


WORD
   
: (~('\n'|'\r'|'/'|'='|';'|','))+
   
;


number
   
: int | float
   
;


int
   
: DIGIT+
   
;


float
   
: DIGIT+ '.' DIGIT* EXPOENTPART?
   
| '.' DIGIT+ EXPOENTPART?
   
| DIGIT+ EXPOENTPART
   
;


HEADER_INIT_CHAR
 
: '/'
 
;


NEWLINE
 
: '\r'? '\n'
 
;


WS  
   
: [ \t\u000C\r\n]+ -> skip
   
;

When I test my grammar, she can't recognize a simple WORD... So anyone can help me?

Thx in advance
Test.g4
TestGrammarUnitTest.cs
ErrorListener.cs

André de Mattos Ferraz

unread,
Apr 10, 2015, 11:11:27 AM4/10/15
to antlr-di...@googlegroups.com
I tried to change the rule of WORD to:

WORD
//: [a-zA-Z0-9]+
    //: ( ~('\n'|'\r'|' '|'\t') )+?
: (~('\r'|'\n'|' '))+
    ;

But the problem persists...

Eric Vergnaud

unread,
Apr 10, 2015, 11:47:09 AM4/10/15
to antlr-di...@googlegroups.com
Hi,

your grammar is far from being complete
it's missing a number of lexer tokens (COMMA, DIGIT...), so you should get warnings when generating the code
you also have colliding token definitions: NEWLINE and WS 
which rule do you call for parsing a file? all the defined rules seem to care about 1 line only

Eric

André de Mattos Ferraz

unread,
Apr 10, 2015, 12:35:28 PM4/10/15
to antlr-di...@googlegroups.com
Hi Eric,

If you open the file TestGrammarUnitTest.cs you will se the follow:

[TestMethod]
public void Word()
{
   
TestParser parser = createParser("zica");
    parser
.title();
}

By the way thx for the tips.

André de Mattos Ferraz

unread,
Apr 13, 2015, 7:17:17 AM4/13/15
to antlr-di...@googlegroups.com
I change my GRAMMAR, using some tips provided by Eric but my problem persists:



dontCareHeaderLine
 
: HEADER_INIT_CHAR (valorDummie | anything)* NEWLINE
 
;




valorDummie
 
: 'Valores dummies' '=' '"' WORD '"'
 
;





anything    
 
: (WORDS | WORD | '=' | ';' | ',' | '&' | '/' | '*' | '+' | '-')+

 
;




titleLine
 
: title (COMMA title)* NEWLINE
 
;




title
 
: WORD
 
;




titleValuesLine
 
: titleValue (COMMA titleValue)* NEWLINE
 
;




titleValue
 
: WORD
 
| number
 
;



number
   
: int | float

   
;




int
   
: DIGIT+
   
;




float
   
: DIGIT+ '.' DIGIT* EXPOENTPART?
   
| '.' DIGIT+ EXPOENTPART?
   
| DIGIT+ EXPOENTPART
   
;


/*
 * Lexer Rules
 */



WORDS
   
:   WORD+
   
;



WORD
 
//: [a-zA-Z0-9]+
   
//: ( ~('\n'|'\r'|' '|'\t') )+?

 
: (~('\r'|'\n'|' '|'\t'))+
   
;


DIGIT
   
: [0-9]
   
;


EXPOENTPART
   
: ['eE'] [+-]? DIGIT+

   
;


HEADER_INIT_CHAR
 
: '/'
 
;


NEWLINE
 
: '\r'? '\n'
 
;



WS  
   
: [\u000C]+ -> skip
   
;

Eric Vergnaud

unread,
Apr 13, 2015, 9:26:28 AM4/13/15
to antlr-di...@googlegroups.com
How did you come to the conclusion that the parser does not recognize WORD? Any error?

André de Mattos Ferraz

unread,
Apr 13, 2015, 11:21:08 AM4/13/15
to antlr-di...@googlegroups.com

Test Name: Word

Test FullName: PowerQC.PMUnitTest.TestGrammarUnitTest.Word

Test Source: c:\Users\h162524\Documents\Visual Studio 2013\Projects\PowerQC.PotentialMehodsClient\PowerQC.PMUnitTest\TestGrammarUnitTest.cs : line 15

Test Outcome: Failed

Test Duration: 0:00:00,0706606


Result Message:

Test method PowerQC.PMUnitTest.TestGrammarUnitTest.Word threw exception: 

System.Exception: <unknown>:1:0: token zica: line 1:0 mismatched input 'zica' expecting WORD

Details:Exception of type 'Antlr4.Runtime.InputMismatchException' was thrown.

Result StackTrace:

at PowerQC.Grammars.ErrorListener.SyntaxError(IRecognizer recognizer, IToken offendingSymbol, Int32 line, Int32 charPositionInLine, String msg, RecognitionException e) in c:\Users\h162524\Documents\Visual Studio 2013\Projects\PowerQC.PotentialMehodsClient\PowerQC.Grammars\ErrorListener.cs:line 28

   at Antlr4.Runtime.ProxyErrorListener`1.SyntaxError(IRecognizer recognizer, Symbol offendingSymbol, Int32 line, Int32 charPositionInLine, String msg, RecognitionException e)

   at Antlr4.Runtime.Parser.NotifyErrorListeners(IToken offendingToken, String msg, RecognitionException e)

   at Antlr4.Runtime.DefaultErrorStrategy.NotifyErrorListeners(Parser recognizer, String message, RecognitionException e)

   at Antlr4.Runtime.DefaultErrorStrategy.ReportInputMismatch(Parser recognizer, InputMismatchException e)

   at Antlr4.Runtime.DefaultErrorStrategy.ReportError(Parser recognizer, RecognitionException e)

   at PowerQC.Grammars.Test.TestParser.title() in c:\Users\h162524\Documents\Visual Studio 2013\Projects\PowerQC.PotentialMehodsClient\PowerQC.Grammars\obj\Debug\TestParser.cs:line 462

   at PowerQC.PMUnitTest.TestGrammarUnitTest.Word() in c:\Users\h162524\Documents\Visual Studio 2013\Projects\PowerQC.PotentialMehodsClient\PowerQC.PMUnitTest\TestGrammarUnitTest.cs:line 17

Eric Vergnaud

unread,
Apr 13, 2015, 8:00:19 PM4/13/15
to antlr-di...@googlegroups.com
Can you switch back your definition of WORD to the original one?

André de Mattos Ferraz

unread,
Apr 14, 2015, 8:28:05 AM4/14/15
to antlr-di...@googlegroups.com
Hi Eric,

Same error:



Test Name: Word
Test FullName: Landmark.PowerQC.PMUnitTest.TestGrammarUnitTest.Word
Test Source: c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.PMUnitTest\TestGrammarUnitTest.cs : line 15
Test Outcome: Failed
Test Duration: 0:00:00,0816688

Result Message:
Test method Landmark.PowerQC.PMUnitTest.TestGrammarUnitTest.Word threw exception: 
System.Exception: <unknown>:1:0: token zica: line 1:0 mismatched input 'zica' expecting WORD
Details:Exception of type 'Antlr4.Runtime.InputMismatchException' was thrown.
Result StackTrace:
at Landmark.PowerQC.Grammars.ErrorListener.SyntaxError(IRecognizer recognizer, IToken offendingSymbol, Int32 line, Int32 charPositionInLine, String msg, RecognitionException e) in c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.Grammars\ErrorListener.cs:line 28
   at Antlr4.Runtime.ProxyErrorListener`1.SyntaxError(IRecognizer recognizer, Symbol offendingSymbol, Int32 line, Int32 charPositionInLine, String msg, RecognitionException e)
   at Antlr4.Runtime.Parser.NotifyErrorListeners(IToken offendingToken, String msg, RecognitionException e)
   at Antlr4.Runtime.DefaultErrorStrategy.NotifyErrorListeners(Parser recognizer, String message, RecognitionException e)
   at Antlr4.Runtime.DefaultErrorStrategy.ReportInputMismatch(Parser recognizer, InputMismatchException e)
   at Antlr4.Runtime.DefaultErrorStrategy.ReportError(Parser recognizer, RecognitionException e)
   at Landmark.PowerQC.Grammars.Test.TestParser.title() in c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.Grammars\obj\Debug\TestParser.cs:line 462
   at Landmark.PowerQC.PMUnitTest.TestGrammarUnitTest.Word() in c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.PMUnitTest\TestGrammarUnitTest.cs:line 17

Eric Vergnaud

unread,
Apr 15, 2015, 12:11:53 PM4/15/15
to antlr-di...@googlegroups.com
according to your screen shot you haven't switched back to 'a'..'z' | 'A'..'Z'

André de Mattos Ferraz

unread,
Apr 15, 2015, 1:18:13 PM4/15/15
to antlr-di...@googlegroups.com

Same error Eric:


Test Name: Word
Test FullName: Landmark.PowerQC.PMUnitTest.TestGrammarUnitTest.Word
Test Source: c:\Users\h162524\Documents\Visual Studio 2013\Projects\Landmark.PowerQC.PotentialMehodsClient\Landmark.PowerQC.PMUnitTest\TestGrammarUnitTest.cs : line 15
Test Outcome: Failed
Test Duration: 0:00:00,0871433

Eric Vergnaud

unread,
Apr 16, 2015, 12:59:53 PM4/16/15
to antlr-di...@googlegroups.com
Hi,

not sure what is going on here. I can assure you this fragment normally works.
I notice you're using the Nuget version of antlr, so I can't reproduce the issue.
My advice would be to start with the simplest grammar, i.e. WORD + WS, and gradually expand it to locate the cause.
If the above doesn't help, file a bug here: antlr4cs

Eric
Reply all
Reply to author
Forward
0 new messages