Generating a simple hand-coded like recursive descent parser

Mr.E

unread,

Sep 8, 2006, 11:21:41 PM9/8/06

to

I recently got my scanner working for my [first] compiler. My
compiler is for an existing BASIC language. I am seeing the
complexity of the language at the parsing level. There are
approximately 600 keyword or keyword combinations which include
compound words due to intrinsic functions.

Is there a parser generator that produces the equivalent of a
hand-coded recursive descent parser? I'm looking for a generator that
doesn't require an engine and doesn't use external libraries... just
plain old C.

I can watch and debug recursive descent code because I can understand
that. I cant imagine trying to debug a table driven parser or having
to rewrite it in BASIC.

The reason for my request is that my compiler will be written in a
BASIC dialect instead of C. I would generate the parser in ANSI C
then rewrite it in BASIC. I'm using a BASIC compiler to bootstrap my
own. It seems to be a good idea to write the compiler in the language
its going to compileso that's what I'm doing.

Also are there any algorithms for AST building. Everything I've
understand tells me that I really want to build an AST and do code
generation from it versus trying to generate code as I go along. I
thought I understood the process but I'm not there yet.

Thank you,

W.

Hans-Peter Diettrich

unread,

Sep 9, 2006, 3:41:20 PM9/9/06

to

Mr.E wrote:

> I recently got my scanner working for my [first] compiler. My
> compiler is for an existing BASIC language.

Can you tell us which one?

> I am seeing the
> complexity of the language at the parsing level. There are
> approximately 600 keyword or keyword combinations which include
> compound words due to intrinsic functions.

Most BASICs are LL(1) and quite easy to parse, even if they have many
different statements/productions.

> Is there a parser generator that produces the equivalent of a
> hand-coded recursive descent parser? I'm looking for a generator that
> doesn't require an engine and doesn't use external libraries... just
> plain old C.

If your BASIC also can be interpreted, the parser can not be that
complicated.

I've written some compilers and decompilers in BASIC, many years ago,
and found it very convenient to use the special built-in functions of my
various BASICs. That's a bit incompatible with the use of parser
generators, in detail for parser code produced for/in C.

If you want to write your compiler in BASIC, you can use more parser
generators, not only for C. CoCo/R may do what you want, with output of
the recursive descent parser in plain old Pascal. But I think that only
expressions will deserve the assistance of an parser generator, until
you get a feeling for how things can be done, then you can proceed with
handcrafting the remaining parts for statements etc. yourself. You also
can start with a very small subset of the language, and extend it later
to the full language. Then you can play with various approaches to the
generation of code and internal data structures very soon, until you
found a well working model. If you start with the full language instead,
you'll have many places in your code, where every modification of your
model has to be reflected, worth nothing but a giant waste of time.

> I can watch and debug recursive descent code because I can understand
> that. I cant imagine trying to debug a table driven parser or having
> to rewrite it in BASIC.

I can't as well ;-)

>
> The reason for my request is that my compiler will be written in a
> BASIC dialect instead of C. I would generate the parser in ANSI C
> then rewrite it in BASIC. I'm using a BASIC compiler to bootstrap my
> own. It seems to be a good idea to write the compiler in the language
> its going to compileso that's what I'm doing.

IMO that's feasible, with your powerful BASIC.

>
> Also are there any algorithms for AST building. Everything I've
> understand tells me that I really want to build an AST and do code
> generation from it versus trying to generate code as I go along. I
> thought I understood the process but I'm not there yet.

In my recent C compiler project I didn't use any parser generator, and
built all data structures manually in the recursive descent parser. IMO
automatic construction of an AST will only defer the problem of
recognizing what has to be done with the parsed input. Since you want to
implement an recursive descent parser, you can collect all required
information during parsing, perhaps, but not necessarily, in a tree-like
structure, from which you can produce code almost immediately. Remember
that an interpreter also interprets the statements one by one, possibly
after translating (scanning) everything into byte code, while loading a
program.

Provided your BASIC allows for procedures, you'll have to think some
time about a decent framework for procedures and their local variables,
so that you can patch the entry code of procedures easily, after
outputting the code for an procedure body. Or you may use a table,
describing the required amount of local memory and other things (line
numbers for ONERROR etc.), that can be inspected at runtime. Most BASIC
dialects do not really compile very well into machine code, so you may
be better off with a virtual machine and an byte code emulator for the
procedure framework, with an escape to compiled machine code for the
evaluation of expressions etc., which really profit from such a compilation.

Feel free to contact me by e-mail, for a more concrete discussion of
your project.

DoDi

Jürgen Kahrs

unread,

Sep 9, 2006, 3:41:37 PM9/9/06

to

Mr.E wrote:

> Is there a parser generator that produces the equivalent of a
> hand-coded recursive descent parser? I'm looking for a generator that
> doesn't require an engine and doesn't use external libraries... just
> plain old C.

CoCo/R leaves it as an option to you to start
reading the generated parser:

http://www.scifac.ru.ac.za/coco/

olive...@gmail.com

unread,

Sep 9, 2006, 10:52:42 PM9/9/06

to

Mr.E wrote:
> There are approximately 600 keyword or keyword combinations which
> include compound words due to intrinsic functions.

What do you mean by keyword combinations?

> Is there a parser generator that produces the equivalent of a
> hand-coded recursive descent parser? I'm looking for a generator
> that doesn't require an engine and doesn't use external libraries...
> just plain old C.

I've seen CoCo/R mentioned... ANTLR also produces recursive descent
(LL(*)) parsers, but converting its output to basic may be about the
least fun thing you could do...

>
> I can watch and debug recursive descent code because I can understand
> that. I cant imagine trying to debug a table driven parser or having
> to rewrite it in BASIC.

You could always write one by hand -- BASIC's grammar isn't overly
complex, and it could be a good learning experience :D

> Also are there any algorithms for AST building. Everything I've
> understand tells me that I really want to build an AST and do code
> generation from it versus trying to generate code as I go along. I
> thought I understood the process but I'm not there yet.

Well that one's up to you... you could make a completely abstract tree:
struct ASTNode {
int isLeaf;
void *leaf;
struct {
int numChildren;
ASTNode *children;
} branch;
}

In this you can just store the whole parse tree, alternatively you just
store the semantics of what is being parsed. In which case basically
all you are doing is building a tree that represents the important bits
of your grammar. eg (from http://en.wikipedia.org/wiki/Tiny_BASIC -- i
really don't know basic :D).
statement ::= PRINT expr-list
IF expression relop expression THEN statement
GOTO expression
INPUT var-list
LET var = expression
GOSUB expression
RETURN
CLEAR
LIST
RUN
END

for which you *might* make a pair of types:
enum statement_type {
print_statement, if_statement, goto_statement, ..., end_statement
};

struct statement_s {
statement_type type;
union {
struct {
expression_list *expressions;
} print_stat;
struct {
relop operator;
expression *lhs, *rhs;
struct statement_s *statement;
} if_stat;
struct {
expression *target;
} goto_statement;
...
};
};

Now when you parse a statement you create a statement struct for the
appropriate branch, and fill in the appropriate bits. But note we
aren't storing syntax info -- all we have in the node is the actual
information we *need* to reconstruct the meaning.

Apologies for any issues in the above codei tend to use languages where
subtyping is an option :D

And other people here can probably explain this somewhat better than i
have :(

--Oliver

Mr.E

unread,

Sep 10, 2006, 9:55:55 AM9/10/06

to

I will look into further. I have P.D. Terry's book on compiler
generators where I think CoCo/R is referenced. I'll have to pull that
one out of storage.

Thanks,

W.

Mr.E

unread,

Sep 10, 2006, 9:55:39 AM9/10/06

to

Hans-Peter Diettrich wrote:
> Mr.E wrote:
>
> > I recently got my scanner working for my [first] compiler. My
> > compiler is for an existing BASIC language.
>
> Can you tell us which one?

FutureBasic - Its a dialect of BASIC for the Macintosh.

> If your BASIC also can be interpreted, the parser can not be that
> complicated.

Maybe because it is my first compiler is just seems complicated.
I don't know how many books I've read on the subject and at the time
they made sense. Once I decided that I would write my own it started
to become a whole lot more challenging than the compilers I've read
about.

> I've written some compilers and decompilers in BASIC, many years ago,
> and found it very convenient to use the special built-in functions of my
> various BASICs. That's a bit incompatible with the use of parser
> generators, in detail for parser code produced for/in C.
>
> If you want to write your compiler in BASIC, you can use more parser
> generators, not only for C. CoCo/R may do what you want, with output of
> the recursive descent parser in plain old Pascal. But I think that only
> expressions will deserve the assistance of an parser generator, until
> you get a feeling for how things can be done, then you can proceed with
> handcrafting the remaining parts for statements etc. yourself. You also
> can start with a very small subset of the language, and extend it later
> to the full language. Then you can play with various approaches to the
> generation of code and internal data structures very soon, until you
> found a well working model. If you start with the full language instead,
> you'll have many places in your code, where every modification of your
> model has to be reflected, worth nothing but a giant waste of time.

This is the approach I am taking. The more I sketched how I want want
to accomplish what I want to do, the more problems I found because the
language appears to be really complex at the parsing level. The user
gets an easy to use language while I pull out small chunks of hair
thinking of how to do all the work under the hood.

> > Also are there any algorithms for AST building. Everything I've
> > understand tells me that I really want to build an AST and do code
> > generation from it versus trying to generate code as I go along. I
> > thought I understood the process but I'm not there yet.
>
> In my recent C compiler project I didn't use any parser generator, and
> built all data structures manually in the recursive descent parser. IMO
> automatic construction of an AST will only defer the problem of
> recognizing what has to be done with the parsed input. Since you want to
> implement an recursive descent parser, you can collect all required
> information during parsing, perhaps, but not necessarily, in a tree-like
> structure, from which you can produce code almost immediately. Remember
> that an interpreter also interprets the statements one by one, possibly
> after translating (scanning) everything into byte code, while loading a
> program.

To add more information, it will be a cross-compiler PPC and Intel on
Mac. I plan to keep the front end separated from the back-end with an
AST. Keeping the front and back separated should allow for

easier cross compiling
easier inclusion of a built in debugger
register usage analysis
code optimization

> Provided your BASIC allows for procedures, you'll have to think some
> time about a decent framework for procedures and their local variables,
> so that you can patch the entry code of procedures easily, after
> outputting the code for an procedure body. Or you may use a table,
> describing the required amount of local memory and other things (line
> numbers for ONERROR etc.), that can be inspected at runtime. Most BASIC
> dialects do not really compile very well into machine code, so you may
> be better off with a virtual machine and an byte code emulator for the
> procedure framework, with an escape to compiled machine code for the
> evaluation of expressions etc., which really profit from such a compilation.

The language is a fully procedual language; local functions, local &
global variables

> Feel free to contact me by e-mail, for a more concrete discussion of
> your project.
>
> DoDi

I've never done anything like this before. I've been gleaning
from a number of books on compilers and finally felt I might
be ready to take the plunge into actually writing one.
Theory is one thing, actually writing one is a something
completely different.

Thank you for the life preserver. Trust me, I will take it.

W.

Pascal Bourguignon

unread,

Sep 10, 2006, 9:54:55 AM9/10/06

to

"Mr.E" <mr.wa...@verizon.net> writes:

> I recently got my scanner working for my [first] compiler. My
> compiler is for an existing BASIC language. I am seeing the
> complexity of the language at the parsing level. There are
> approximately 600 keyword or keyword combinations which include
> compound words due to intrinsic functions.
>
> Is there a parser generator that produces the equivalent of a
> hand-coded recursive descent parser? I'm looking for a generator that
> doesn't require an engine and doesn't use external libraries... just
> plain old C.
>
> I can watch and debug recursive descent code because I can understand
> that. I cant imagine trying to debug a table driven parser or having
> to rewrite it in BASIC.
>
> The reason for my request is that my compiler will be written in a
> BASIC dialect instead of C. I would generate the parser in ANSI C
> then rewrite it in BASIC. I'm using a BASIC compiler to bootstrap my
> own. It seems to be a good idea to write the compiler in the language
> its going to compileso that's what I'm doing.

Unless you're trapped on a planet light years away, with nothing else
than a BASIC, I see no reason to use that to bootstrap a compiler.

Even if you have to compile the compiler to BASIC for some strange
reason, I see no reason why you shouldn't use more powerful tools to
generate the BASIC code...

So, here is a simple Recursive Descent Parser generator, written in
Common Lisp, that can generate the parser in lisp, or in a
pseudo-basic. You could modify it to generate the code you want, for
your target basic system.

http://www.informatimago.com/develop/lisp/small-cl-pgms/rdp/

For the scanner, it uses a regexp library with a simplistic algorithm
(trying the regexp for each token in turn.

If speed was needed, a smarter algorithm using a fusionned DFA for the
scanner would be in order, and of course, a table based parser too.

> Also are there any algorithms for AST building. Everything I've
> understand tells me that I really want to build an AST and do code
> generation from it versus trying to generate code as I go along. I
> thought I understood the process but I'm not there yet.

There's no specific algorith to build the AST. It's merely built by
the actions associated to the grammar rules, and executed by the
parser when the rule is used.

The reason why it's interesting to build this AST as an intermediate
data structure, is that apart from the most simple cases, some
semantic analysis, and some optimizations are needed, for which we
need a global view of the AST.

For example to check that the declaredtype of the variables matches
the type of the operators with which the variables are used, or to to
type inference.

The remaining phases can be implemented as progressive transformations
of the AST into less and less abstract trees until what remains is a
list of target processor instructions.

Here is an example of the grammar written for my simple recursive
descent parser.

;;; Example taken from: http://en.wikipedia.org/wiki/Recursive_descent_parser

(defgrammar example
:terminals ((ident "[A-Za-z][A-Za-z0-9]*")
;; real must come first to match the longest first.
(real "^\$[-+]\\?[0-9]\\+\\.[0-9]\\+\\([Ee][-+]\\?[0-9]\\+\$\\?\\)")
(integer "[-+]\\?[0-9]\\+"))
:start program
:rules ((--> factor
(alt ident
number
(seq "(" expression ")" :action $2))
:action $1)
(--> number (alt integer real) :action $1)
(--> term
factor (rep (alt "*" "/") factor)
:action `(,$1 . ,$2))
(--> expression
(opt (alt "+" "-"))
term
(rep (alt "+" "-") term :action `(,$1 ,$2))
:action `(+ ,(if $1 `(,$1 ,$2) $2) . ,$3))
(--> condition
(alt (seq "odd" expression
:action `(oddp ,$2))
(seq expression
(alt "=" "#" "<" "<=" ">" ">=")
expression
:action `(,$2 ,$1 ,$3)))
:action $1)
(--> statement
(opt (alt (seq ident ":=" expression
:action `(setf ,$1 ,$3))
(seq "call" ident
:action `(call ,$2))
(seq "begin" statement
(rep ";" statement
:action $2)
"end"
:action `(,$2 . ,$3))
(seq "if" condition "then" statement
:action `(if ,$2 ,$4))
(seq "while" condition "do" statement
:action `(while ,$2 ,$4))))
:action $1)
(--> block
(opt "const" ident "=" number
(rep "," ident "=" number
:action `(,$2 ,$4))
";"
:action `((,$2 ,$4) . ,$5))
(opt "var" ident
(rep "," ident :action $2)
";"
:action `(,$2 . ,$3))
(rep "procedure" ident ";" block ";"
:action `(procedure ,$2 ,$4))
statement
:action `(block ,$1 ,$2 ,$3 ,$4))
(--> program
block "." :action $1)))

The forms following the :ACTION keyword build the AST nodes from the
nodes returned by the subnodes identified by $1, $2, etc...

And here is an example of (lisp) AST built by the generated parser;

(parse-example
"
const abc = 123,
pi=3.141592e+0;
var a,b,c;
procedure gcd;
begin
while a # b do
begin
if a<b then b:=b-a ;
if a>b then a:=a-b
end
end;
begin
a:=42;
b:=30.0;
call gcd
end.")

-->

(BLOCK
(((IDENT "abc" 11) (INTEGER "123" 17))
((IDENT "pi" 32) (REAL "3.141592e+0" 35)))
((IDENT "a" 57) (IDENT "b" 59) (IDENT "c" 61))
((PROCEDURE (IDENT "gcd" 79)
(BLOCK NIL NIL NIL
((WHILE (("#" "#" 112) (+ ((IDENT "a" 110))) (+ ((IDENT "b" 114))))
((IF (("<" "<" 151) (+ ((IDENT "a" 150))) (+ ((IDENT "b" 152))))
(SETF (IDENT "b" 159)
(+ ((IDENT "b" 162)) (("-" "-" 163) ((IDENT "a" 164))))))
(IF ((">" ">" 186) (+ ((IDENT "a" 185))) (+ ((IDENT "b" 187))))
(SETF (IDENT "a" 194)
(+ ((IDENT "a" 197)) (("-" "-" 198) ((IDENT "b" 199))))))))))))
((SETF (IDENT "a" 235) (+ ((INTEGER "42" 238))))
(SETF (IDENT "b" 246) (+ ((REAL "30.0" 249)))) (CALL (IDENT "gcd" 264))))

(The integers in third position in the sublists are the positions in
the source of the corresponding token.)

--
__Pascal Bourguignon__ http://www.informatimago.com/

Mr.E

unread,

Sep 10, 2006, 1:31:39 PM9/10/06

to

olive...@gmail.com wrote:
> Mr.E wrote:
> > There are approximately 600 keyword or keyword combinations which
> > include compound words due to intrinsic functions.
>
> What do you mean by keyword combinations?

Examples off the top of my head

keyword or compound phase, usage
"compile", directive on what compiler options will be used
"compile long if", conditional compile block statement
"compile xelse", what to do if the conditional compile expression fails
"compile end if", end of conditially compiled block
"clear", reinitialize global and main scoped variables for entire
program
"clear local", initialize variables belonging to function before use
"clear local mode" , same as "clear local" but user global variable are
inaccessible ( not visible to the function)
"local", indicates start of local variables scope
"local mode", start of local scope, procedure can not see global
variables
"def fn <function name> [( paramater list)]", prototype a function
"def fn <function name> = expression", simple function
"def fn <function name> USING functionPointer", virtual function
prototype

> > Is there a parser generator that produces the equivalent of a
> > hand-coded recursive descent parser? I'm looking for a generator
> > that doesn't require an engine and doesn't use external libraries...
> > just plain old C.
>
> I've seen CoCo/R mentioned... ANTLR also produces recursive descent
> (LL(*)) parsers, but converting its output to basic may be about the
> least fun thing you could do...

I will check out CoCo/R.

> > I can watch and debug recursive descent code because I can understand
> > that. I cant imagine trying to debug a table driven parser or having
> > to rewrite it in BASIC.
>
> You could always write one by hand -- BASIC's grammar isn't overly
> complex, and it could be a good learning experience :D

I thought it would be character building and and I am finding it
humbling :-)

I've been trying to follow the AST of the LCC compiler ( I own Hanson &
Frasier's book). The idea behind creating DAG's is good, but maybe
problematic for me. I'm not willing to use code or ideas I dont fully
comprehend. The first time it breaks or something near it breaks is
when your in trouble.

> Apologies for any issues in the above codei tend to use languages where
> subtyping is an option :D

No need to apologize. You explained it the way you know how with an
example. Its up to me now to take it in.

Thank you for your assistance,

W.

Mr.E

unread,

Sep 10, 2006, 5:18:49 PM9/10/06

to

Pascal Bourguignon wrote:

> Unless you're trapped on a planet light years away, with nothing else
> than a BASIC, I see no reason to use that to bootstrap a compiler.
>
> Even if you have to compile the compiler to BASIC for some strange
> reason, I see no reason why you shouldn't use more powerful tools to
> generate the BASIC code...

From what I've read, many compilers are grown and extended by using
their own language, I like that idea.

> So, here is a simple Recursive Descent Parser generator, written in
> Common Lisp, that can generate the parser in lisp, or in a
> pseudo-basic. You could modify it to generate the code you want, for
> your target basic system.
>
> http://www.informatimago.com/develop/lisp/small-cl-pgms/rdp/
>
> For the scanner, it uses a regexp library with a simplistic algorithm
> (trying the regexp for each token in turn.
>
> If speed was needed, a smarter algorithm using a fusionned DFA for the
> scanner would be in order, and of course, a table based parser too.

I will check it out.

> > Also are there any algorithms for AST building. Everything I've
> > understand tells me that I really want to build an AST and do code
> > generation from it versus trying to generate code as I go along. I
> > thought I understood the process but I'm not there yet.
>
> There's no specific algorith to build the AST. It's merely built by
> the actions associated to the grammar rules, and executed by the
> parser when the rule is used.

Oh darn, an algorithm would give a standard to work with and compare
to.

> The reason why it's interesting to build this AST as an intermediate
> data structure, is that apart from the most simple cases, some
> semantic analysis, and some optimizations are needed, for which we
> need a global view of the AST.
>
> For example to check that the declaredtype of the variables matches
> the type of the operators with which the variables are used, or to to
> type inference.
>
> The remaining phases can be implemented as progressive transformations
> of the AST into less and less abstract trees until what remains is a
> list of target processor instructions.
>
>
>
>
> Here is an example of the grammar written for my simple recursive
> descent parser.
>

I will glean from your example all that I can. I appreciate it.

Thank you,

W.

Pascal Bourguignon

unread,

Sep 10, 2006, 11:43:38 PM9/10/06

to

"Mr.E" <mr.wa...@verizon.net> writes:

> olive...@gmail.com wrote:
>> Mr.E wrote:
>> > There are approximately 600 keyword or keyword combinations which
>> > include compound words due to intrinsic functions.
>>
>> What do you mean by keyword combinations?
>
> Examples off the top of my head
>
> keyword or compound phase, usage
> "compile", directive on what compiler options will be used
> "compile long if", conditional compile block statement
> "compile xelse", what to do if the conditional compile expression fails
> "compile end if", end of conditially compiled block
> "clear", reinitialize global and main scoped variables for entire
> program
> "clear local", initialize variables belonging to function before use
> "clear local mode" , same as "clear local" but user global variable are
> inaccessible ( not visible to the function)
> "local", indicates start of local variables scope
> "local mode", start of local scope, procedure can not see global
> variables
> "def fn <function name> [( paramater list)]", prototype a function
> "def fn <function name> = expression", simple function
> "def fn <function name> USING functionPointer", virtual function
> prototype

The main idea of the recursive descent parser, is that you can select
the production rule from the first terminal.

With these rules:

start --> c1|c2|c3|c4|k1|k2|k3|l1|l2|d1|d2|d3

c1 --> "compile"
c2 --> "compile" "long" "if"
c3 --> "compile" "xelse"
c4 --> "compile" "end" "if"
k1 --> "clear"
k2 --> "clear" "local"
k3 --> "clear" "local" "mode"
l1 --> "local"
l2 --> "local" "mode"
d1 --> "def" "fn" <function name> [ "(" <paramater list> ")" ]
d2 --> "def" "fn" <function name> "=" <expression>
d3 --> "def" "fn" <function name> "USING" <functionPointer>

from the parse-start function and the first terminal read (let's say
it's "compile", you cannot know which rule to apply, which function to
call.

So you have to transform the grammar to make sure that for each non
terminal the set of the first symbols derivable from that non terminal
is disjoint from the set for the other non terminals (at least, for
the other non terminals derivable from the same places).

c --> "compile" c1|c2|c3|c4
c1 -->
c2 --> "long" "if"
c3 --> "xelse"
c4 --> "end" "if"

k --> "clear" k1|kl
k1 -->
kl --> "local" kl1|kl2
kl1 -->
kl2 --> "mode"

l --> "local" l1|l2
l1 -->
l2 --> "mode"

d --> "def" "fn" <function name> d0|d1|d2|d3
d0 -->
d1 --> "(" <paramater list> ")"
d2 --> "=" <expression>
d3 --> "USING" <functionPointer>

Then, for each production rule you can write a function that will know
immediately from the current terminal symbol read (token) which
function to call (what non-terminal production rule to invoke).

parse-start:
if token="compile" then c

parse-c:
accept "compile"
if token="long" then c2
else if token="xelse" then c3
else if token="end" then c4
else c1

parse-d:
accept "def"
accept "fn"
parse-function-name
if token="(" then parse-d1
else if token="=" then parse-d2
else if token="USING" then parse-d3
else parse-d0

parse-d1:
accept "("
parse-paremter-list
accept ")"

...

But notice how this creates "artificial" non-terminals and their
corresponding procedures.

My example Recursive Descent Parser Generator doesn't normalize the
grammar in such a way, so you'd have to make sure to write the grammar
such as the first set of each non-terminal group is disjoint.

It would be better to use a parser generator that did this grammar
normalization (there is a simple algorithm to do this normalization).

Then the procedures generated won't match 1-1 the production rules you
write.

Therefore I wouldn't bother with expecting a readable generated
parser. Table driven parsers are well known and work well and
efficiently.

Tommy Thorn

unread,

Sep 10, 2006, 11:44:00 PM9/10/06

to

Mr.E wrote:
> From what I've read, many compilers are grown and extended by using
> their own language, I like that idea.

Using the right language will teach you concepts that makes doing this
so much easier. At the very minimum you need product (~ "struct") and
sum (~ "union") types. In (classic) BASIC you'd have to simulate those
making it a very unnatural and messy implementation.

Consider starting with a much simpler example, say an expression
_interpreter_ to handle this language

e := e + e | e * e | ( e ) | k | v

(where * binds stronger than +, k is an integer constant, and v a
variable of some value).

If you manage this in BASIC, then next make a compiler for it.

This exercise will likely teach you much that will be useful for a full
compiler for BASIC.

(building ASTs)

> Oh darn, an algorithm would give a standard to work with and compare
> to.

There isn't any "algorithm" to be had. Building AST nodes is trivial, or
if it isn't, you can bet writing a compiler won't be.

Tommy
PS: Here's a slightly compressed solution in C for the first half.

#include <ctype.h>
#include <stdio.h>
char *s = "1+x*3+4*(5+y)";
int pExp(void), env[256] = { ['x'] = 2, ['y'] = 3 };
int pFactor(void) {
int v = 0;
if (*s == '(')
++s, v = pExp(), ++s;
else if (isdigit(*s))
while (isdigit(*s))
v = 10*v + *s++ - '0';
else if (isalpha(*s))
v = env[(unsigned)*s++];
return v;}
int pTerm(void) {
int v = pFactor();
while (*s == '*') ++s, v *= pFactor();
return v;}
int pExp(void) {
int v = pTerm();
while (*s == '+') ++s, v += pTerm();
return v;}
int main(int argc, char **argv) {printf("%d\n", pExp()); return 0;}

Arargh...@arargh.com

unread,

Sep 11, 2006, 8:39:39 AM9/11/06

to

On 10 Sep 2006 23:44:00 -0400, Tommy Thorn <tommy...@gmail.com>
wrote:

>Mr.E wrote:
>> From what I've read, many compilers are grown and extended by using
>> their own language, I like that idea.
>
>Using the right language will teach you concepts that makes doing this
>so much easier. At the very minimum you need product (~ "struct") and
>sum (~ "union") types. In (classic) BASIC you'd have to simulate those
>making it a very unnatural and messy implementation.

You don't really need a "union" type. It would just make some things
a little easier. I didn't need one for BCET.

<snip>

>If you manage this in BASIC, then next make a compiler for it.

I did. BCET is written mostly in Basic. Some routines are written in
Assembler, mostly for speed(they were originally developed in Basic).

>This exercise will likely teach you much that will be useful for a full
>compiler for BASIC.

It can be useful. As part of constant expression reduction, I had to
scan the expression tree as if I were interpreting it. But that was
added later. The original compiler would actually generate
instructions to add 1 and 1 for a statement like:
LET A = 1 + 1

<snip>
--
ArarghMail609 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html