Description:
This includes most of a grammar for a PP domain-specific language.
Syntactically and behaviorally it is very much like C, with reduced
types
(int, string, bool, no structs, no enums). Most of the PP-specific
actions
are simple function calls.
Major departures from C are:
- dynamic typing, with optional runtime checking
- function literals and anonymous functions
- list/tuple literals
- symbols are private unless declard public
- no headers
I have converted all of the existing device code to the new syntax, with
a few
bits of hand-waving to work out still. The actual interpreter for the
language is yet to be written. :)
I'm sending this change as a chance to get some feedback.
Tim
Makefile | 78
grammar.l | 1012 +++++
grammar.y | 882 +++++
identifier.h | 78
language.cpp | 304 +
language.h | 146
lexer_test.cpp | 838 ++++
main_lex.cpp | 36
main_parse.cpp | 20
pipe_file.h | 244 +
pp-files/amd_k8.pp | 9168
+++++++++++++++++++++++++++++++++++++++++++++++++++++
pp-files/cpu.pp | 51
pp-files/cpuid.pp | 3543 ++++++++++++++++++++
pp-files/msr.pp | 924 +++++
pp-files/pci.pp | 6018 ++++++++++++++++++++++++++++++++++
pp-files/pp.pp | 267 +
string_file.h | 116
variable.h | 218 +
variable_test.cpp | 152
19 files changed, 24095 insertions(+)
Please review this at http://codereview.appspot.com/154047
Affected files:
A language/Makefile
language/Makefile
A language/grammar.l
language/grammar.l
A language/grammar.y
language/grammar.y
A language/identifier.h
language/identifier.h
A language/language.cpp
language/language.cpp
A language/language.h
language/language.h
A language/lexer_test.cpp
language/lexer_test.cpp
A language/main_lex.cpp
language/main_lex.cpp
A language/main_parse.cpp
language/main_parse.cpp
A language/pipe_file.h
language/pipe_file.h
A language/pp-files/amd_k8.pp
language/pp-files/amd_k8.pp
language/pp-files/amd_k8.pp
A language/pp-files/cpu.pp
language/pp-files/cpu.pp
language/pp-files/cpu.pp
A language/pp-files/cpuid.pp
language/pp-files/cpuid.pp
language/pp-files/cpuid.pp
A language/pp-files/msr.pp
language/pp-files/msr.pp
language/pp-files/msr.pp
A language/pp-files/pci.pp
language/pp-files/pci.pp
language/pp-files/pci.pp
A language/pp-files/pp.pp
language/pp-files/pp.pp
language/pp-files/pp.pp
A language/string_file.h
language/string_file.h
A language/variable.h
language/variable.h
A language/variable_test.cpp
language/variable_test.cpp
http://codereview.appspot.com/154047/diff/1/10
File language/grammar.l (right):
http://codereview.appspot.com/154047/diff/1/10#newcode2
language/grammar.l:2: DEC [0-9]
[:digit:] ?
http://codereview.appspot.com/154047/diff/1/10#newcode5
language/grammar.l:5: SPACE [ \t\n\v\f\r]
Why can't you use [:space:] ?
http://codereview.appspot.com/154047/diff/1/10#newcode69
language/grammar.l:69: /* Support C-style comments, even nested. */
Argh. Don't do this. :)
Why deviate from the C-style rule for comments? It's what people are
used to.
http://codereview.appspot.com/154047/diff/1/10#newcode86
language/grammar.l:86: 0[bB]{DEC}+ { dump(yyscanner); return
int_literal(yyscanner); }
What is this token? This is weird to me.
I don't think you want to match 0b34.
http://codereview.appspot.com/154047/diff/1/19
File language/grammar.y (right):
http://codereview.appspot.com/154047/diff/1/19#newcode88
language/grammar.y:88: | string_literal { fprintf(stderr, "%d
primary_expression <- string_literal\n", lex_lineno()); }
Agh; why is the casing different for different literals :(
It's annoying to have rules "string_literal" and "STRING_LITERAL".
http://codereview.appspot.com/154047/diff/1/19#newcode112
language/grammar.y:112: // TODO: '123()' parses to a
function_call_expression. Make sure to validate
Do you really want this to allow <primary_expression>()?
I even find "{return 0;}()" questionable above; I was expecting this to
be restricted to identifiers for a symbol table lookup.
Note: You also allow 123()().
http://codereview.appspot.com/154047/diff/1/19#newcode122
language/grammar.y:122: // We allow dangling commas for convenience.
Blech. Okay, I guess.
http://codereview.appspot.com/154047/diff/1/19#newcode129
language/grammar.y:129: | IDENTIFIER ':' assignment_expression {
fprintf(stderr, "%d argument <- IDENTIFIER ':' assignment_expression\n",
lex_lineno()); }
Why do you want to allow this?
(b:a=5) ? Clearly I'm missing something here.
http://codereview.appspot.com/154047/diff/1/19#newcode161
language/grammar.y:161: : '+' { fprintf(stderr, "%d unary_operator <-
'+'\n", lex_lineno()); }
Why allow + as a unary operator?
http://codereview.appspot.com/154047/diff/1/19#newcode173
language/grammar.y:173: : cast_expression { fprintf(stderr, "%d
multiplicative_expression <- cast_expression\n", lex_lineno()); }
Presumably this is for order of expressions? It seems like there must be
a better way.
On Wed, Nov 11, 2009 at 7:59 PM, <mjte...@gmail.com> wrote:
> I just glanced at the lex/yacc stuff. I'll read through more later.
>
>
> http://codereview.appspot.com/154047/diff/1/10
> File language/grammar.l (right):
>
> http://codereview.appspot.com/154047/diff/1/10#newcode2
> language/grammar.l:2: DEC [0-9]
> [:digit:] ?
>
> http://codereview.appspot.com/154047/diff/1/10#newcode5
> language/grammar.l:5: SPACE [ \t\n\v\f\r]
> Why can't you use [:space:] ?
I didn't realize lex had built in character classes
> http://codereview.appspot.com/154047/diff/1/10#newcode69
> language/grammar.l:69: /* Support C-style comments, even nested. */
> Argh. Don't do this. :)
>
> Why deviate from the C-style rule for comments? It's what people are
> used to.
This was a very late addition. I originally had only // line comments.
> http://codereview.appspot.com/154047/diff/1/10#newcode86
> language/grammar.l:86: 0[bB]{DEC}+ { dump(yyscanner); return
> int_literal(yyscanner); }
> What is this token? This is weird to me.
> I don't think you want to match 0b34.
I want to match it and then issue a useful error from int_literal().
Defining it more tightly results in "0b34" parsing as INT_LITERAL "0"
+ IDENTIFIER "b34". I tested C (using octal "08") and it behaves
similarly.
> http://codereview.appspot.com/154047/diff/1/19
> File language/grammar.y (right):
>
> http://codereview.appspot.com/154047/diff/1/19#newcode88
> language/grammar.y:88: | string_literal { fprintf(stderr, "%d
> primary_expression <- string_literal\n", lex_lineno()); }
> Agh; why is the casing different for different literals :(
>
> It's annoying to have rules "string_literal" and "STRING_LITERAL".
string_literal is a non-terminal rule that enables quoted string
joining, like C. If you prefer, I could define int_literal :
INT_LITERAL for the other literals. Would that be easier to read?
You never want to use STRING_LITERAL, just string_literal.
> http://codereview.appspot.com/154047/diff/1/19#newcode112
> language/grammar.y:112: // TODO: '123()' parses to a
> function_call_expression. Make sure to validate
> Do you really want this to allow <primary_expression>()?
This is how precedence is define here, by chaining different classes
of expressions. Because my function decl syntax is simpler than C, I
got rid of the complicated declarator, abstract_declarator, etc rules.
This is a case where the grammar is flexible enough to accept
constructs that are semantically invalid. The TODO is just a reminder
to me to validate that the called symbol actually evaluates to a
function, which "123" obviously does not.
> I even find "{return 0;}()" questionable above; I was expecting this to
> be restricted to identifiers for a symbol table lookup.
You were expecting what "this" to be restricted to identifiers?
function literals? I just can't parse your sentence :)
> Note: You also allow 123()().
>
> http://codereview.appspot.com/154047/diff/1/19#newcode122
> language/grammar.y:122: // We allow dangling commas for convenience.
> Blech. Okay, I guess.
C does, python does. For list literals it's convenient.
> http://codereview.appspot.com/154047/diff/1/19#newcode129
> language/grammar.y:129: | IDENTIFIER ':' assignment_expression {
> fprintf(stderr, "%d argument <- IDENTIFIER ':' assignment_expression\n",
> lex_lineno()); }
> Why do you want to allow this?
> (b:a=5) ? Clearly I'm missing something here.
It's another case of precedence rules. I stole the expression
evaluation stack up from a C99 grammar. It is valid to call a
function as "foo(a=5);" The result of an assignment expression is the
value assigned, so changing it would be a departure from C.
> http://codereview.appspot.com/154047/diff/1/19#newcode161
> language/grammar.y:161: : '+' { fprintf(stderr, "%d unary_operator <-
> '+'\n", lex_lineno()); }
> Why allow + as a unary operator?
Because C does.
> http://codereview.appspot.com/154047/diff/1/19#newcode173
> language/grammar.y:173: : cast_expression { fprintf(stderr, "%d
> multiplicative_expression <- cast_expression\n", lex_lineno()); }
> Presumably this is for order of expressions? It seems like there must be
> a better way.
I could make one great big expression rule and then define precedence,
but I actually find this cleaner.