Grammar to parse COBOL Picture clause

324 views
Skip to first unread message

timhilco

unread,
Apr 24, 2013, 1:49:37 PM4/24/13
to antlr-di...@googlegroups.com

I’m trying to build a COBOL parser. One of the more difficult COBOL statements to parse is the PICTURE clause.

 I’m having the most trouble with handling the decimal point ‘.’ . It can be used to indicate the end of the statement and it can also show where the decimal point is in a formatted string. The grammar below handles a single decimal point at the end of the statement [PIC X(9).], but does not handle the situation where there are two decimal points [Pic 999.99.] in the statement.

 An alternative approach I was exploring was is to treat anything between the ‘Pic’ and the final period ‘.’ as a character string and let some java code analyze the picture clause. In this case, I can accumulate the entire string , strip off the last period and create an additional STATEMENT_END token to send to the parser. I’m looking for some help on how to make this work.

Any thoughts on the best approach or way to change my grammar to handle multiple decimal points is welcome.

 My initial grammar is as follows.

 

grammar CoolPictureParser;

@header {

package antlr;

import java.util.*;

}

 dd_picture_clauses

   : (dd_picture_kw  picture_string END_OF_PIC)+

    ;

 dd_picture_kw

    : KW_PICTURE

    | KW_PIC

    ;

picture_string:

    CURRENCY?

    (PICCHAR+ REPEAT?)+

    (PUNCTUATION (PICCHAR+ REPEAT?)+)* ;

   KW_PIC: 'Pic';

PIC_WS : [ \t\r\n] -> skip;

CURRENCY : ~[0-9ABCDPRSVXZa-z\*\+\-\/\,\.\;\(\)\=\'\"] ;

PICCHAR : 

    [ABEGPSVXZabegpsvxz90\+\-\*\$]

  | 'CR'

  | 'DB' ;

REPEAT: '(' DIGIT+ ')' ;

//PUNCTUATION : [\/\,\.\:] ;

PUNCTUATION : [\/\,\:] ;

END_OF_PIC: '.';

 fragment

    DIGIT: [0-9];

Terence Parr

unread,
Apr 24, 2013, 1:53:11 PM4/24/13
to antlr-di...@googlegroups.com
Hi. Use a lex mode to switch interpretations of '.' inside [Pic stuff]. would that work?
Ter


--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Dictation in use. Please excuse homophones, malapropisms, and nonsense. 

timhilco

unread,
Apr 25, 2013, 10:54:25 AM4/25/13
to antlr-di...@googlegroups.com
I'm exploring my second approach of accumulating the entire string, stripping off the period and emitng an END_OF_STATEMENT token.

Here is my new Lexer:
lexer grammar CoolPictureLexer;
@header {
package antlr;
import java.util.*;
}
KW_PIC: 'Pic' -> pushMode (PICTURE_STRING);
WS : [ \t\r\n] -> skip;

mode PICTURE_STRING;
//-----------------------------------

PIC_STRING:    [0-9ABEGPSVXZabegpsvxz'+''-'*''$''('')''.''/'','':']+ ->popMode ;
PIC_WS : [ \t\r\n] -> skip;

Before I popMode, I would like to add an action as follows:

if the string ends with a '.', I would change the text of the PIC_STRING token removing the ending '.' and then emit a END_OF_STATEMENT token. If the string does not end with a '.', then leave the token alone.

My question is can one emit an additional new token within an action of a lexer rule..

For the use case: Pic 999.999 Value 100.10 ., I would like to just emit a PIC_STRING token with the text =  999.999, a 'VALUE' token, NUMBER token and END_OF_STATEMENT token.
For the use case: Pic 999.999. , I would like to just emit a PIC_STRING token with the text =  999.999 without the period and emit an END_OF_STATEMENT.
Reply all
Reply to author
Forward
0 new messages