ANTLRv4 Parser for RPGLE (a structured langauge)

1,091 views
Skip to first unread message

Christoff Erasmus

unread,
Nov 7, 2013, 2:46:15 PM11/7/13
to antlr-di...@googlegroups.com
Hello,
 
I am a developer on the IBM i platform, coding mainly in RPGLE.
 
Short Description of the Langauge:
Been around since the days of punch cards, and has been through various changes and flaviours over the years.
 
It is a structured procedural langauge. Which means that every character in a sequence of 80, has a specific meaning based on its location.
Thus to specify a variable the "D-spec" template is used (refer below). There are various "Specs" for different purposes.
 
D-Spec (Template Layout):
 
.....DName+++++++++++ETDsFrom+++To/L+++IDc.Keywords+++++++++++++++++++++++++++++Comments++++++++++++
 
Please note that the colors was chosen randomly, just to highlight the various fields. 
 

Definition of Standalone Field

Positions

Available Tools:
There are various methods to connect to the IBMi (5250, iNavigator etc.).
As well as JTOpen a java client to access various services unique to the IBMi as well as an included JDBCv4 driver.
 
My Dream:
The Eclipse based IDE for development is lacking some key features.
As a developer, I have some thoughts on how to improve my workflow with the help of Eclipse plugins.
To start, I would like to have my "own" ANTLR based parser for RPGLE.
 
My Fanboy gushing:
I find ANTLR very intriguing. So much so that I bought both of Mr. Parrs books.
He has some mind blowing concepts. Thanks Sir, You Rock!
 
My Problem;
It seem that ANTLRs grammer is not capable of handling structured code in an eligant way.
Which is the expected behavior, as ANTRL was intended for stream files, which does not rely on position for meaning.
 
I have done a lot of thinking on the subject, and my proposed solution is to write a translator, to convert the Structured code into a XML stream, which ANTLR would handle with out effort.
The problem I have with my solution is that, I will be writing 2 parsers, thus doubling the effort, complexity and the change of bugs. It will also be a maintance night mare when new features are added to RPGLE.
 
My Request:
It feels like I am missing something, simple.
Will you point me in the right direction, please?
 
Regards,
Christoff Erasmus
 

Terence Parr

unread,
Nov 7, 2013, 4:51:12 PM11/7/13
to antlr-di...@googlegroups.com
Hi Christoff :)  I'm not superclear on your needs. Do you just need an RPGLE grammar?  Are you trying to make your own eclipse plugin?
Ter


--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Dictation in use. Please excuse homophones, malapropisms, and nonsense. 

Jim Idle

unread,
Nov 7, 2013, 8:54:58 PM11/7/13
to antlr-di...@googlegroups.com
I don't think that you need to transform it in to XML. RPG has column number significance and ANLTR4 (anv v3) lexers allow access to the column position. 

So without delving back too far to remember RPG, the simplest possible solution is that you define the lexical patterns, merging any overlaps, then after you complete the match you change the token type according to the start position of the token. After this the parser does not need to care where the tokens were positioned.

ALPHANUM:
  [a-zA-Z][a-zA-Z0-9]+

  {
     switch (getCharPositionInLine()) {

      case 0: setType(LABEL); break;
      case 20: ...
     }
  }

If for some reason it turns out to be more complicated than pure position (column 20 means one thing when following X and another when following Y), then you can use lexer modes to cater for this by pushing a mode when X is seen and another mode when Y is seen. 

If even this cannot deal with it, then I would write a custom lexer that implements the ANTLR4 interface, in which case you can do anything you like. While the RPG compiler may be syntax drive, I doubt that it is so complicated that the context cannot be inferred in the lexical stage. After this, then once again the parser need not know about the positions of the tokens it is being fed. 

Basically if you are going to transform it in to XML, then you will need to parse it anyway, so there is no point having an ANTLR grammar at all. This class of language has position sensitive tokens so that the compiler could be very simple (not much computing power back then), so there may be quirks, but I think that they would all be solvable.

If you are trying to make plugins that deal with syntax and compiling etc, then may find that you need to be able to parse sections of the code rather than the entire translation unit - you should also do this kind of thing in background threads of course. There have been a number of discussions about this in this group.

Jim



--

Christoff Erasmus

unread,
Nov 8, 2013, 11:41:56 AM11/8/13
to antlr-di...@googlegroups.com
Hello,  

On 7 November 2013 23:51, Terence Parr <pa...@cs.usfca.edu> wrote:
... RPGLE grammar... eclipse plugin?

I intend to create both the grammar and a eclipse plugin.
Starting with the grammar, as it can be used independent of eclipse.

On 8 November 2013 03:54, Jim Idle <ji...@temporal-wave.com> wrote:
...  don't ... need to transform it in to XML...

Yes, perfect. 

Thanks for your thoughts, and pointing me in the right direction.
I'll do some research, experimentation and will try to keep the group updated on my slow progress.

Warm Regards from a sunny South Africa,
Christoff Erasmus.

Ryan Eberly

unread,
Sep 2, 2014, 8:40:05 AM9/2/14
to antlr-di...@googlegroups.com
Christoff ,
    Did you get to tackle the parser?  I recently transitioned from Java to RPGLE, I'm interested.
-Ryan.

goteti udaya bhanu

unread,
Feb 12, 2015, 7:02:54 AM2/12/15
to antlr-di...@googlegroups.com
Hi, I need to write a code analysis tool on RPGLE code. Please let me know the progress on this.
 
Thanks
Udaya

David Gregory

unread,
Feb 14, 2015, 10:56:13 AM2/14/15
to antlr-di...@googlegroups.com
I am in the very early stages of some work on this because like the original poster I have a burning hatred for IBM's RDP and in my case I very much like IntelliJ.

grammar Rpg;

sourceMember : (statement)* ;

statement : (procedure | assignment) ';' ;

assignment : (QUALIFIED|SYMNAME) (ASSIGNOP|'=') expr ;

expr : '(' expr ')'
     | (bif | procedure)
     
| (NOT|PLUS|MINUS) expr
     |<assoc=right> expr EXP expr
     | expr MULT expr
     | expr DIV expr
     | expr PLUS expr
     | expr MINUS expr
     | expr (COMPARISON|'=') expr
     | expr AND expr
     | expr OR expr
     | (LITERAL|QUALIFIED|SYMNAME)
     
;

bif : '%' SYMNAME '(' params? ')' ;

procedure : SYMNAME '(' params? ')' ;

params : expr (':' expr)* ;

// Margin and date area
DATEAREA : DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT
           DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT
           .? .? .? .? .? -> channel(HIDDEN)
         
/* {
          if (getCharPositionInLine() != 0)
          { setType(INT); setChannel(DEFAULT_TOKEN_CHANNEL); }
         } -> channel(HIDDEN) */
         ;

// Language constructs
IF : 'IF' ;
ELSE : 'ELSE';
ELSEIF : 'ELSEIF';
SELECT : 'SELECT' ;
WHEN : 'WHEN' ;
DOW : 'DOW' ;
DOU : 'DOU' ;
FOR : 'FOR' ;
AND : 'AND';
OR : 'OR';

COMMENT: '//' ~[\r\n]* -> channel(HIDDEN) ;

// Operators
NOT : 'NOT' ;
PLUS : '+' ;
MINUS : '-' ;
EXP : '**' ;
MULT : '*' ;
DIV : '/' ;

COMPARISON : GT
           | LT
           | GE
           | LE
           | NE
           ;

GT : '>' ;
LT : '<' ;
GE : '>=' ;
LE : '<=' ;
NE : '<>' ;

ASSIGNOP : CPLUS
         | CMINUS
         | CMULT
         | CDIV
         ;

CPLUS : '+=' ;
CMINUS : '-=' ;
CMULT : '*=' ;
CDIV : '/=' ;

// Literals
LITERAL : DEC
        | INT
        | STRING
        | SYMBOL
        ;

DEC : INT '.' INT ;
INT : DIGIT+
   
;

STRING : '\'' (~[\'\r\n]|'\'\'')* '\'' ;
SYMBOL : '*BLANK' 'S'?
       
| '*ZERO'  'S'?
       
| '*HIVAL'
       | '*LOVAL'
       | '*NULL'
       | '*ON'
       | '*OFF'
       | '*ALL' STRING
       ;

// Symbol
QUALIFIED : SYMNAME '.' SYMNAME ;
SYMNAME : [A-Z$#@]+ [A-Z0-9_]*

// Whitespace
WS : [ \t\r\n] -> channel(HIDDEN);

fragment DIGIT : [0-9] ;

It will currently parse simple (if nonsensical) input such as the following: 

123123123123     A += (4 + +3) ** 2 ** 3 > (3 = 2 < -1.5) <> 1 
123123123123     OR NOT B(NOT %A(B : 2) : 'babr''' : 4 > 2) AND B >= 3;
123123123123     A = NOT B.C = *BLANK;
123123123123     F();


I am beginning to get my head around ANTLR somewhat after some initial brain-pain and a lot of re-reading of the extremely excellent Definitive Reference. I may throw this into a GitHub repo at some point in case anybody wants to help with this once I've got my head around ensuring that the margins work correctly - as you can see from the commented out action above I get the impression that lexer actions will be the way to do this - and once I've fleshed out the remaining language constructs. The intention is that this will be used with a case-insensitive input stream (since the RPG compiler converts all symbols to uppercase anyway).

I get the impression that if I get my head around the date area & margin (which are in a fixed position on the line), the rest of the fixed-format syntax will kind of fall into place from there.

Regards,
  David Gregory

Ryan Eberly

unread,
Feb 18, 2015, 4:58:00 PM2/18/15
to antlr-di...@googlegroups.com
Goteti and David.
   I wrote a rudimentary ANTLR4 grammar for RPG4 (both fixed and free formats) over the last few weeks.  It consumes our rather large code base gracefully.  

  I'm considering open sourcing it.

  Is this something you would be interested in collaborating on?

 - Ryan. 
...

David Gregory

unread,
Feb 18, 2015, 6:18:36 PM2/18/15
to antlr-di...@googlegroups.com
Yes indeed! I would be particularly interested to see your approach to the fixed format code.

Regards,
  David Gregory

Eric Wilson

unread,
Feb 25, 2015, 10:20:22 AM2/25/15
to antlr-di...@googlegroups.com
I would love to collaborate on this too. I had an email to Hans Boldt and Barbara Morris to see if they had an ANTLR grammar for ILE RPG (Fixed and Free Form). In the interim I was just getting ready to write one myself.

Nilo Roberto da Cruz Paim

unread,
Feb 26, 2015, 3:22:15 PM2/26/15
to antlr-di...@googlegroups.com

Hi All,

 

I’m not sure about what is happening.

 

1.       I’ve installed Antlr4 via NuGet on a VS2010 project (Install-Package Antlr4 -Version 4.3.0)

2.       Added a combined grammar on the project (.g4) with the default contents, that has it Build Action setted to “Antlr4”

 

It doesn’t create the parser or the lexer class, anything I’ve tried to do…

 

What am I missing here?

 

TIA,

Nilo - Brazil

Sam Harwell

unread,
Feb 26, 2015, 11:06:07 PM2/26/15
to antlr-di...@googlegroups.com

Hi Nilo,

 

If you aren’t getting an error when you build, then it’s generating the classes in the intermediate output directory (default obj/Debug for debug builds, and obj/Release for release builds).

 

If this doesn’t resolve you question, can you post some more details in a new issue?

https://github.com/tunnelvisionlabs/antlr4cs/issues

 

Thank you,

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

--

You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Nilo Roberto da Cruz Paim

unread,
Feb 27, 2015, 6:20:00 AM2/27/15
to antlr-di...@googlegroups.com

Hi, Sam.

 

Thanks for answering.

 

You are right. It’s generating the classes on obj/Debug directory. Also, it’s generating Grammar.g4.parser.cs and Grammar.g4.lexer.cs.

 

I think I need some more studying on this…

 

Thanks again.

 

Nilo – Brazil


No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5646 / Virus Database: 4299/9184 - Release Date: 02/26/15

goteti udaya bhanu

unread,
Mar 5, 2015, 3:28:58 AM3/5/15
to antlr-di...@googlegroups.com
I am also interested in collaborating with this.
 
Thanks
Udaya

Christoff Erasmus

unread,
Mar 5, 2015, 4:43:56 AM3/5/15
to antlr-di...@googlegroups.com

Coolsbeans,  we have a mini coding army assembled.

Greating a Parser for the new FRPGLE (Free RPG for ILE) should be a straightforward task as there are no positional considerations.

My issue is with parsing the fixed format. I have not been able to wrap my head around it.

I have been waiting for Ryan's method of Fixed Format Parsing.
Please Ryan don't keep us in suspense.

Share your code with us.

Regards,
Christoff

--
You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/shs4scLGUh0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.

goteti udaya bhanu

unread,
Mar 13, 2015, 7:54:51 AM3/13/15
to antlr-di...@googlegroups.com
Hi all,
 
        Please share the AS400 parser code. Does it handle free format + fixed length format?
 
Thanks
Udaya

Rahul Kumar

unread,
Aug 9, 2015, 12:49:01 AM8/9/15
to antlr-discussion
HI Ryan,

Would be a great help if you could share your RPG4(fixed and free formats) grammar with us.

Thanks in advance.
Rahul.

Daniele Gariboldi

unread,
Sep 10, 2015, 5:27:04 AM9/10/15
to antlr-di...@googlegroups.com
I found it here, but its missing "core" sources ....:

Ryan Eberly

unread,
Mar 4, 2016, 10:34:28 AM3/4/16
to antlr-discussion
What did you mean by "core" sources missing?   The project on github is now pretty stable - and parses most RPGLE gracefully.  Any examples to the contrary have been fixed pretty quickly, so feel free to contribute them :-)

If you encounter any questions or problems, report an issue on github. 
Reply all
Reply to author
Forward
0 new messages