On 11/14/2013 12:22 PM, Jelle Feringa wrote:
> My question is what is the right way to go about this?
> Here we have an example of a procedure defined in RAPID.
The example seems to be missing, but in general, you don't start with the parser, you start with the
scanner, identifying the individual words that you should recognize.
> PROC top_front( string strNoStepIn )
> ! procedure block
> MoveL ...;
> ENDPROC
becomes a sequence of tokens (1 per line), empty lines and // text is added to clarify what you
read. (token names are written all uppercase)
PROC
IDENTIFIER(top_front)
PARENTHESIS_OPEN
STRING // if "string" is not a built-in, it would become an IDENTIFIER
IDENTIFIER(strNoStepIn)
PARENTHESIS_CLOSE
// Assuming ! means 'comment', skipped it.
MoveL
// skipped some
SEMICOLON
ENDPROC
You break down your input text in these small elementary words with the scanner. I didn't do it, but
it's often useful to add a suffix or prefix to keywords (I use ...KW, eg PROCKW), and other tokens
(I use ...TK), it makes the parser rules below more readable, and avoids name conflicts between
different tokens that are closely related, like the keyword string denoting a type and a literal
string like "abcd".
The parser takes this stream of tokens, and reconstructs the parts you want to keep together, with
grammar rules, like
Procedure : PROC IDENTIFIER PARENTHESIS_OPEN FormalParameters PARENTHESIS_CLOSE Statments ENDPROC ;
Procedure : PROC IDENTIFIER PARENTHESIS_OPEN PARENTHESIS_CLOSE Statments ENDPROC ;
A "Procedure" is thing that starts with the keyword PROC and ends with the keyword ENDPROC. There
are 2 variants, one with and one without FormalParameters.
FormalParameters : FormalParameter
| FormalParameters COMMA FormalParameter
;
FortmalParameter : Type IDENTIFIER ;
Type : STRING
| ...
;
FormalParameters is one or more FormalParameter, separated by COMMA. The latter is a sequence of
Type and IDENTIFIER.
> Intuitively I would write a regex that matches the
> name of the procedure, its argument and the procedure block.
In general, regex is not powerful enough to handle programming languages. Consider the case
string x = "endproc";
in the middle of a proc. Good luck detecting the right 'endproc' word. Similar cases exist when a
user comments away a part of a proc.
You may get it working for a set of cases, but all cases that are valid for the RAPID compiler is
impossible, probably.
> Since this pattern is so present in the language, I'd like to get it
> right and in a p(l)ythonic manner. Thing is that I'm too new to the
> parsing to really see that.
The pattern is not really special, { .. } or BEGIN .. END are mostly the same thing, although they
group different things.
Good luck with your parsing adventure,
Albert