I have the game developer issue on 'adding languages to your games' but it
really doesn't say much about the guts of how to do it.
I really don't want to dive into 'how a compiler works' and such, but if I
have to I will. What are my options? Where can I look for help? What is the
best place to start? I would rather not make a huge complex scripting
language my first time around, so I'm looking for some help on making a
simple language first. And then I will build on that.
Any help or ideas?
Thanks,
jrm
_Constructing Language Processors for Little Languages_ by Randy M. Kaplan (John
Wiley & Sons 1994)
As for tools, flex and bison.
To learn flex and bison, _lex & yacc_ by John R. Levine, Tony Mason & Doug Brown
(O'Reilly & Associates, 1995).
Good luck.
--
Jason Shankel,
Maxis, Inc.
s h a n k e l
at
p o b o x . c o m
Guess what I *just* ordered? =] Thanks man.
The book sounds like just what I need.
>As for tools, flex and bison.
>To learn flex and bison, _lex & yacc_ by John R. Levine, Tony Mason & Doug
Brown
>(O'Reilly & Associates, 1995).
What is "flex and bison"? Just curious.
Thanks,
John Mancine
> I'm very interested in setting up a small scripting language for controlling
> some of my characters in my game. The problem is that I'm not really sure
> where to start.
>
> I have the game developer issue on 'adding languages to your games' but it
> really doesn't say much about the guts of how to do it.
>
> I really don't want to dive into 'how a compiler works' and such, but if I
> have to I will. What are my options? Where can I look for help? What is the
> best place to start? I would rather not make a huge complex scripting
> language my first time around, so I'm looking for some help on making a
> simple language first. And then I will build on that.
You could try Python (http://www.python.org). It comes with full source.
Michael K
My advice... save yourself a lot of pain and use Lua. Lua is a small,
powerful, efficient scripting language. It was written specifically as
an extension language and it's very good. I haven't seen any language (
including Python ) which is a better choice for embedding.
For embedding in general, you need to answer some basic questions:
How do I get data from the embedded language into C?
How do I get data from C into the embedded language?
How do I call the embedded language from C?
How do I call C from the embedded language?
Following is some sample code which answers all of the above.
#include <lua.h>
// this function creates a new function in Lua, calls the
// function with 2 numbers from C, and gets the result from
// Lua back into C. It answers the first 3 questions:
void foo( void )
{
lua_beginblock();
// create a new function in Lua. quoted stuff is Lua code
lua_dostring( "function add_nums( x, y ) return x + y end" );
// push 2 numbers onto the lua call stack, call the Lua func.
lua_pushnumber( 5 );
lua_pushnumber( 9 );
lua_callfunction( lua_getglobal( "add_nums" ) );
// get the result back from Lua ( result == 14 )
int result = lua_getnumber( lua_lua2C( 1 ) );
lua_endblock();
}
// To answer the 4th question, here's code which extends Lua
// with a new C function. In this case, I've added the ability
// to call the WinAPI MessageBox from Lua.
void DoMessageBox( void )
{
char * message = lua_getstring( lua_lua2C( 1 ) );
MessageBox( ( LPCSTR ) message, ( LPCSTR ) NULL, ( WORD ) MB_OK );
}
// then, to add the "MessageBox" function to lua, you'd call:
lua_register( "MessageBox", & DoMessageBox );
// and later, in Lua, you could call MessageBox with:
MessageBox( "some message" )
Lua is a truly wonderful language - clean, small, efficient, and free
for commercial use. It's very solid code ( current version 3.1 ), used
in many industrial applications. It's also well supported.
One of the things I like best about it is that I can't think of any way
to improve it. If I had decided to write a scripting language for
myself, I doubt mine would be as good. For the record, the original Lua
code base was generated with flex/bison or lex/yacc, and evolved from
there. So, if you're planning on going the flex/bison route, save
yourself 3 years of refinement and use Lua.
Regards,
ashley
PS: Here's the official blurb on Lua:
-----------------------------
* What is Lua?
------------
Lua is a programming language originally designed for extending
applications,
but also frequently used as a general-purpose, stand-alone language.
Lua combines simple procedural syntax (similar to Pascal) with powerful
data description constructs based on associative arrays and extensible
semantics. Lua is dynamically typed, interpreted from bytecodes, and
has
automatic memory management, making it ideal for configuration,
scripting,
and rapid prototyping.
Lua is implemented as a small library of C functions, written in ANSI
C,
and compiles unmodified in all known platforms. The implementation
goals
are simplicity, efficiency, portability, and low embedding cost.
Lua has been awarded the first prize (technological category) in the
Second
Compaq Award for Research and Development in Computer Science. This
award
is a joint venture of Compaq Computer in Brazil, the Brazilian Ministry
of
Science and Technology, and the Brazilian Academy of Sciences.
* New in version 3.1
------------------
+ NEW FEATURE: anonymous functions with closures (via "upvalues").
+ new syntax:
- local variables in chunks.
- better scope control with DO block END.
- constructors can now be also written: { record-part; list-part }.
- more general syntax for function calls and lvalues, e.g.:
f(x).y=1
o:f(x,y):g(z)
f"string" is sugar for f("string")
+ strings may now contain arbitrary binary data (e.g., embedded zeros).
+ major code re-organization and clean-up; reduced module
interdependecies.
+ no arbitrary limits on the total number of constants and globals.
+ support for multiple global contexts.
+ better syntax error messages.
+ new traversal functions "foreach" and "foreachvar".
and more...
* Availability
------------
Lua is freely available for both academic and commercial purposes and
can be downloaded from the sites below. The current version is 3.1.
Home page: http://www.tecgraf.puc-rio.br/lua/
http://csg.uwaterloo.ca/~lhf/lua/
In Brazil: ftp://ftp.tecgraf.puc-rio.br/pub/lua/lua.tar.gz
In Canada: ftp://csg.uwaterloo.ca/pub/lhf/lua/lua.tar.gz
In Germany: ftp://ftp.uni-trier.de/pub/languages/lua/lua.tar.gz
In Greece: ftp://ftp.ntua.gr/pub/lang/lua/lua.tar.gz
* Contacting the authors
-----------------------
Lua has been developed by TeCGraf, the Computer Graphics Technology
Group
of PUC-Rio (the Pontifical Catholic University of Rio de Janeiro in
Brazil).
TeCGraf is a laboratory of the Department of Computer Science.
Dozens of industrial products developed by TeCGraf use Lua.
Send your comments, bug reports and anything else to l...@tecgraf.puc-
rio.br.
For reporting bugs, try also the mailing list: lu...@tecgraf.puc-rio.br
Flex and Bison are two of the common implementations of Lex and Yacc,
respectively (two tools that assist in compiler development; Lex deals
with scanning and Yacc with parsing). Both handle jobs you could just
as well do yourself, but they can make your life more convenient, at
least in the scanning and parsing department. They won't help with
backend code generation though; that's going to be your bag
regardless. But for scripting systems, a virtual stack machine often
suffices as a target language, and VSMs are easy to generate code for
from a parse tree... making Yacc quite handy. If your language is
tiny enough, you might not need Yacc, as writing a recursive descent
parser manually is quite simple (just usually not as efficient as
Yacc's results).
Once you do get a simple "little" script system up and running, play
around with some more complex grammars and token arrangements. It's
amazing what you can add to your languages sometimes with minimal
effort.
--
Chris Hargrove
Programmer, 3DRealms Entertainment
chr...@3drealms.com
http://www.3drealms.com
While this is probably good advice in many cases, using any
prepackaged language (Lua or otherwise) isn't the best solution for
many games. Games often have data structures and concepts that have
no business existing in general programming languages, and hence could
benefit from a language custom written for their needs. A prime
example is UnrealScript, which actually integrates the concept of
current actor state into the language grammar. That's something that
would be inappropriate in a normal language, but is ideal for Unreal's
environment.
hmm... ok. Thanks for the help.
I just re-read the article in game developer, and I now see that these 2
things were mentioned in the end of the article.
So is "yacc" a way to handle to the actual 'syntax' etc. of the little
language being made? I'm sorry if I sound like a complete newbie... but in
this area I am. =]
It just helps for me to re-iterate what I *think* the concepts are.
Thanks,
John Mancine
>>including Python ) which is a better choice for embedding.
>>[snip]
>
>While this is probably good advice in many cases, using any
>prepackaged language (Lua or otherwise) isn't the best solution for
>many games. Games often have data structures and concepts that have
>no business existing in general programming languages, and hence could
>benefit from a language custom written for their needs. A prime
>example is UnrealScript, which actually integrates the concept of
>current actor state into the language grammar. That's something that
>would be inappropriate in a normal language, but is ideal for Unreal's
>environment.
Well, I would like to access game specific information. So does that count
LUA out? Ie. I too would like to be able to tell the actor state within the
script.... BUT for now if I can just make a simple 'library' of game
function calls that can be accessed through a script... with some minor
conditions and such, I would be happy.
What sort of scripting can I aim for my first time around? Being able to
handle when a THING gets a message like "touched" etc. (from the game dev.
article) and being able to react individually through a script would be
wonderful.
Thanks,
John Mancine
>I'm very interested in setting up a small scripting language for controlling
>some of my characters in my game. The problem is that I'm not really sure
>where to start.
If you're using Windows 95/NT, you should look into Active Scripting.
That'll allow you to concentrate on the resulting operations, as the
scanning and parsing is done by pre-written modules. The neatest thing
is that the parsing modules are replacable, so you can support whatever
language the user is familiar with. JavaScript and VBScript are
currently available from MS, and Perl is available from a 3rd party.
There's an article about it at:
http://www.microsoft.com/mind/0297/activescripting.htm
---
John Hattan High UberPopeness -The First Church of Shatnerology
The Code Zone Sweet Software for a Saturnine World
hat...@fastlane.net http://www.fastlane.net/~hattan/
I found lex very useful although what I used it for was not really a
langage, more a set of AI paramaters grabbed in from a text file.
>
>I'm very interested in setting up a small scripting language for controlling
>some of my characters in my game. The problem is that I'm not really sure
>where to start.
>
>I have the game developer issue on 'adding languages to your games' but it
>really doesn't say much about the guts of how to do it.
>
>I really don't want to dive into 'how a compiler works' and such, but if I
>have to I will. What are my options? Where can I look for help? What is the
>best place to start? I would rather not make a huge complex scripting
>language my first time around, so I'm looking for some help on making a
>simple language first. And then I will build on that.
>
>Any help or ideas?
>Thanks,
>jrm
>
>
Justin Heyes-Jones, just...@hotmail.com.
Yes, Yacc deals with syntax ("syntax analysis" and "parsing" are
basically synonyms). I'm not sure if you're familiar with language
grammars yet (if not, you will be :) but I'll give you a quick
example. Say you want to whip up a quick calculator "language" that
handles numbers, and the four add, subtract, multiply and divide
operators. A language grammar (using "expression" as the starting
nonterminal symbol) might look like this:
expression -> term | term + expression | term - expression
term -> factor | factor * term | factor / term
factor -> number | '(' expression ')'
number -> [0-9]+ /* regular expression shorthand, you get the idea */
This grammar puts multiply and divide at a higher precedence than add
and subtract, like you would expect. This grammar isn't predicative,
in that with both expression and term, there are right-hand sides to
their rules that start with the same symbol. Depending on the
implementation of the grammar, that may or may not be acceptable,
fortunately if it's not then there are easy ways to change the grammar
into a usable form (any non-predicative grammar can be changed to a
predicative one by utilizing a few splitting rules). But I digress...
Yacc constructs resembles the above grammar form (known as BNF;
actually the above isn't exact BNF but you get the idea). You write
your language grammar in a way similar to this, combining each
production rule with C code that you want to throw in when one of
these productions is used. Yacc then generates a parser utilizing
your grammar and whatever production code you provide.
Lua wouldn't be a very good extension language if you couldn't access
data in the host program. If you saw my earlier message, I included code
which demonstrates:
accessing C data from Lua,
accessing Lua data from C,
calling Lua functions from C,
calling C functions from Lua
My examples took around a dozen lines of code to do all of it. It's very
easy to do. With no prior knowledge you can get it working in a few
minutes. It's certainly easier than writing your own language, even with
flex and bison.
You asked about tracking actor state in the scripts. Here's how I do it
in my game:
Lua has a built-in data type called "tables" which is an associative
array. Tables can be indexed with any kind of value, and can store any
kind of value ( string, number, reference to other table, function
pointer, void* from C ), so they are very flexible.
Here's some Lua code showing table syntax ( -- are comments ):
bob = {} -- create an empty table named "self"
bob.name = "Bob" -- a string value
bob.age = 42 -- numeric values
joe = {}
joe.name = "Joe"
joe.best_friend = bob -- reference to table
joe.age = 7
self = bob
print( self.name ) -- prints "Bob"
self = joe
print( self.name ) -- prints "Joe"
print( self.best_friend.name ) -- prints "Bob"
In my game, each actor has its own table. Before I activate a script for
an actor, my C code sets the variable "self" in Lua to equal that actor's
table. Then, each actor can read its own data by referencing the self
table.
I've also added functions to Lua so that actors can access each other's
tables. Each actor has an ID value, and I've added API functions to
return a table given an ID. So, in Lua, I can do things like this:
actor_id = GetClosestActorID()
actor = GetTable( actor_id )
if self.enemy == actor.name then
Attack( actor_id )
end
If you want to use your own data structures instead of tables, you can
easily provide direct access to your game's data structures from Lua.
The "user data" type in Lua is a void* in disguise and was provided
specifically for that purpose.
Lua is an extension language, so it was specifically designed for the
types of applications we are discussing. It provides all of the basic
procedural constructs, and a few powerful data types, and you build on
top of it.
To give you a better idea of what it does for you, here are the Lua
reserved words:
and do else elseif end function if local
nil not or repeat return then until while
I have good VC++ 5 project files for the latest version of Lua. They
compile retail and debug builds of the Lua libraries, a stand-alone
console version of Lua, and a Lua compiler ( if you want to pre-compile
your Lua code into bytecodes ). Email me if you'd like them.
Regards,
ashley
Gosh, how to make a simple topic tough, hmmm?
To set up a rough and ready scanner+parser you do something like the
following....
(This is _really_ rough and ready... no sophisticated stuff today)
{routine}
--Set up a small string array (20 strings is more than enough)
--Input a line
:::::::::::::Scanning
--Select the first element of the string array
--Repeat the following
----Scan through the line , adding each chartacter to the curent element of
the string array
---- When you encounter a space, skip it, and select the next element of the
array to start sending characters to.
--Keep doing this until you reach the end of the line.
:::::::::::::::parsing
--Now you have a string array containing your command...
--Compare the first element of the string to your list of commands
-- If you find it, call a routine or class or whatever that executes the
command, and pass on the rest of the string-array as data
--if you don't find it, generate an error, and stop
--now do the next line.
--Command execution routine accepts the data from the string array, and uses
the elements following the command for data,.
If you're looking for numeric data, do a string to numeric conversion.
If there is more or less data than expected: generate an error.
Once this is done, proceed as if you'd just made a 'normal function call'
from inside the programme.
{/routine}
Now that wasn't too hard, was it?
It's certainly not impressive, and it won't win any prizes for elegance, but
in 90% of the cases it gets the job done just fine. :-)
There is a lot of room for refinement here, I know, but it illustrates a
number of the basic principles nicely.
A refinement might be : if you want to implement structures, you might want
to call the parsing routine recursively.
{LOGO}
The "childrens'" computer language LOGO has some very sophisticated scanning
and parsing tools, which are very easy to learn.
(Notably: List handleing and the TO command). If you want to learn a lot
about scanning and parsing quickly, see if you can get your hands on a full
version of this language.
If a 6-year-old can do it, so can you, after all! ;-)
if I would want to summarise the above in Logo:
/primitives and reserveds in CAPS/
TO Handle :Line
:newline= BUTFIRST :Line
GO FIRST :line
/this handles a command /
/(fill in command here)/:
:Variable1=ITEM 1 :newline
:Variable2=ITEM 2 :newline
/ skip the rest of the line, other people might need it/
/do something with those variables/
GO "finish
/heres an example of a simple structure element, note the recursion/
dosomethinganumberoftimes:
:numberoftimes=FIRST :newline
REPEAT :numberoftimes [Handle BUTFIRST :newline]
GO "finish
finish:
END
{/ LOGO}
BTW I used a LOGO example instead of C because parsing is one of LOGO's
specialties,so don't start thinking I'm a 6-year old ;-)
I hope this helps you some! :-)
Met Vriendelijke Groet,
Kim Bruning.
John Mancine heeft geschreven in bericht
<35b5b...@news.amaesd.k12.mi.us>...
>>Once you do get a simple "little" script system up and running, play
>>around with some more complex grammars and token arrangements. It's
>>amazing what you can add to your languages sometimes with minimal
>>effort.
>
>
>hmm... ok. Thanks for the help.
>I just re-read the article in game developer, and I now see that these 2
>things were mentioned in the end of the article.
>
>So is "yacc" a way to handle to the actual 'syntax' etc. of the little
>language being made? I'm sorry if I sound like a complete newbie... but in
>this area I am. =]
>It just helps for me to re-iterate what I *think* the concepts are.
>
>Thanks,
>John Mancine
>
>
>
>
This is not scanning, this is only whitespace separation. Scanning
involves tokenization, which this has none of.
>:::::::::::::::parsing
>--Now you have a string array containing your command...
>--Compare the first element of the string to your list of commands
>-- If you find it, call a routine or class or whatever that executes the
>command, and pass on the rest of the string-array as data
>--if you don't find it, generate an error, and stop
>--now do the next line.
This is not parsing either; quite frankly it doesn't even come close
to it.
What you're describing is something similar to writing a Quake-style
console, not writing a compiler. The two have no correlation.
<snip scanner>
>
>This is not scanning, this is only whitespace separation. Scanning
>involves tokenization, which this has none of.
No one ever said you can't use strings as tokens. :-)
<Snip half the parser >
>This is not parsing either; quite frankly it doesn't even come close
>to it.
It's a parser alright, it's just _very_ simple. (You can't say I didn't
warn you :-p ;-) )
>What you're describing is something similar to writing a Quake-style
>console, not writing a compiler. The two have no correlation.
You can use the code to write a compiler just as easily as you can write an
interpreter.
In the case of a compiler you write the code to be executed into a file,
while for an interpreter you execute it on the spot.
Since I've left that part open, the code is just as relevant for compiling
as it is for interpreting.
If you think you can do better, please post your parser to this NG too. I'm
always open to learning something new from an expert.
Met Vriendelijke Groet,
Kim Bruning
> You can use the code to write a compiler just as easily as you can write an
> interpreter.
> In the case of a compiler you write the code to be executed into a file,
> while for an interpreter you execute it on the spot.
> Since I've left that part open, the code is just as relevant for compiling
> as it is for interpreting.
>
> If you think you can do better, please post your parser to this NG too. I'm
> always open to learning something new from an expert.
>
The scheme you described is, as pointed out, more akin to a
Quake-console than to a language. Not that there's anything wrong with
that. One of my favorite gadgets is a debugging console which does
exactly what you said (breaks a space-delimited string into a list of
strings and invokes a callback based on the first item in the list).
But this is not how you typically implement a language. This is more
like a command-shell interface.
A true language has a grammar, and that's the difficult bit. The part
that you left open is the part which assembles a list of tokens into a
grammatical structure.
For example, let's say we wanted to parse the phrase "if ((x>0) || (y>0
&& z>0))" using a scheme such as yours. Well, first of all we'd get
"if", which is no big deal, but then we'd get "((x>0" and "||" and
"(y>0". Okay, but if it was restated as "if( (x>0) || ( y > 0 && z> 0)
)" then we'd get a completely different set of tokens ( "if(" followed
by "(x>0)", "||", "(", "y", etc.)
So we would have two grammatically identical phrases which would have to
be parsed in two completely different ways. When you think of all the
possibilities, you see that the combinations can explode out of
control.
But let's say you are tokenizing properly. You get the ">" token. What
does your callback do? Well, clearly it has to keep a state indicating
the last variable and expecting a variable or a value for the next
statement. You get the "||" token. Well, you have to know whether
you've just received a complete truth statement and you have to set a
state to expect a new truth statement. You get ")", now you have to see
whether you're terminating a truth statement and whether it's a compound
truth statement or not. It goes on like this.
THAT's the meat of parsing and that's the part that you've left out. It
can be a very snarly problem. Luckily, there exist tools to automate
this process and allow us to express grammars in easily readable forms.
You should check out lex & yacc (or their freeware cousins flex and
bison).
Also, if you're really interested in learning from the experts, you
should read "Compilers: Principles, Techniques and Tools" (the Dragon
Book) by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman
(Addison-Wesley, 1986).
--
Jason Shankel,
Maxis, Inc.
email provider: p o b o x . c o m
user id: s h a n k e l
Good luck with that one, spambots.
You do have strings in your tokens... but tokens also have some form
of _meaning_, like "identifier", "keyword", "floating point constant",
etc. These meanings are determined during scanning, not during
parsing, and they form the definition of the token so the parser knows
what to do with it in the parse tree. The string the token is
composed of is only the "lexeme" of the token, not the entire token
itself.
><Snip half the parser >
>It's a parser alright, it's just _very_ simple. (You can't say I didn't
>warn you :-p ;-) )
No, it's not a parser. Parsers are inherently hierarchical and
decompose a token stream into simpler nonterminals, forming a parse
tree. It is virtually impossible to write a useful compiler that is
entirely linear in the way you described. All you mentioned the
parser doing is checking your whitespace-separated strings against a
series of commands and matching them. That's still scanning. All
you're doing is retrieving some linear token separation and giving
some meaning to the tokens. You're not organizing them according to a
language structure or in any grammatical fashion whatsoever... in
other words, it's not parsing.
Not to mention the fact that the scanning you're doing is inefficient
as well. Breaking a string up into whitespace-delimited fragments
then matching those fragments to a command list (can we say way too
many calls to "strcmp" anyone?) takes far more time than a normal
scanner, which uses a DFA. And a DFA doesn't require whitespace or
any fixed strings to match a token; you can just as easily build one
via regular expressions (actually my internal scanner goes from
regular expressions to an NFA and then converts to a DFA, but the end
result is the same; probably close to the way lex works internally).
>If you think you can do better, please post your parser to this NG too. I'm
>always open to learning something new from an expert.
I can post a parser anytime, but you probably wouldn't know what to do
with it. To learn the difference between a scanner and parser, I
highly recommend reading a book on compilers (such as the "dragon
book" by Aho et. al, mentioned in another reply I believe), and pick
up a copy of lex & yacc.
I agree. Python is a wonderful language to use for game scripting. There are
comprehensive docs and intergrating it with your code is relatively painless.
Its a great language for scripting environments and supports shared libraries,
so it makes your game very expandable...
Michael McIntosh
-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum
He meant, that the complexity of the languages is different by several
orders of magnitude. In your console command language, you have basically no
hierarchical structure at all, you have commands like:
new game 2
map baha.map
which can easily be scanned and executed. But in a language, you have
syntactical hierarchies like:
integer a;
while((a*c - 4) != 2 && c > 2) a := 5+(7*2*my_function(a*b + sin(x)));
which are not possible to parse in a trivial linear way, although they are
perfectly parseable in a linear fashion using a stack based approach:
Basically you push tokens onto a stack, and check if you can collapse
anything on the stack into bigger blocks - like you see that you have a
variable, you collapse that into an expression. Then you see you have two
expressions with a * between them, then you can collapse all that into a new
expression etc. That approach is what YACC uses for example, and is called a
LR-parser, or bottom-up parsing.
The other approach, top-down parsing, starts from the syntactical
specification for the entire file, and starts breaking things down into
smaller bits. You can write a so-called recursive descent parser using that
technique - basically for each syntactical element in your language you
write a function that is called when another element expects the construct
the function describes. Let me quote some code describing the "if" construct
in an old recursive descent parser I wrote:
// if_statement : IF '(' bool_expr ')' statement
void
if_statement(int s)
{
int out;
nexttok();
if(lookahead != '(') {
prserr("no ( after if");
exit(0);
}
nexttok();
bool_expr();
if(lookahead != ')') {
prserr("unbalanced parentheses");
exit(0);
}
nexttok();
//EMIT conditional branch etc
out = new_label(); // allocate a new label
emit("gofalse %d", 3, out);
statement(s);
emit("label %d", 0, out);
}
lookahead always gives the next token, from the lexicographical analysis.
For the if construct, we see that we expect the token 'if' (that is detected
by another construct, and that's why the if_statement function was called at
all). Then we expect an opening paranthese, a boolean expression, a closing
paranthese, and a statement. In the code, the paranthese is matched, then
bool_expr() is called recursively, which is just like if_statement but for
expressions. If that doesn't fail, we look for the ending ')'. Then it's
time to emit some code (pseudo-code here, but might as well be assembler).
We mark a label and emit a conditional branch, which will jump over the
statement if the bool_expr turned out to be false. Then we call statement()
which will parse a statement.
The grammar is explicitely given by the functions you write in a
recursive-descent parser. However things can get pretty heavy after a while,
and that's why compiler-compilers exist, like yacc - they let you describe
the grammar abit like I described the if statement above, and they generate
the parsing C-code themselves, inserting the wanted actions whereever the
syntactical elements complete (sort of like my assembler output in the
above).
>If you think you can do better, please post your parser to this NG too. I'm
>always open to learning something new from an expert.
I could post the entire source for the language I pasted some stuff from
above if you're interested, including the run-time interpreter, but it's a
couple of thousand lines so I might better put it on a website somewhere. On
the other hand, there are thousands of other free languages, compilers and
interpreters on the net to use. Without knowledge of compilers you would
probably find them annoying to read though.
Someone else recommended the so-called "Dragon" book about compilers. That
book is indeed one of the most common books, but in my opinion, it's not
very good. Best to recommend is a compiler course at school of course.
/Bjorn Wesen
>The scheme you described is, as pointed out, more akin to a
>Quake-console than to a language. Not that there's anything wrong with
>that. One of my favorite gadgets is a debugging console which does
>exactly what you said (breaks a space-delimited string into a list of
>strings and invokes a callback based on the first item in the list).
>
>But this is not how you typically implement a language. This is more
>like a command-shell interface.
>
>A true language has a grammar, and that's the difficult bit. The part
>that you left open is the part which assembles a list of tokens into a
>grammatical structure.
I would agree with your analysis up to a point, but there _are_
full-blown languages that use methods similar to the one posted. The
obvious one that springs to mind is forth, whose compiler/interpreter
code can be summarised as:
while we have input:
extract token (delimited by whitespace)
if its a known word:
execute associated code
else if its a number:
push number on stack
else:
print error message
break out of loop
lisp-like languages can be handled in a similar way - for example
x = 2+(3*4) in lisp becomes (set x (plus 2 (times 3 4)))
Complex parsing is really only necessary in languages that use infix
notation - I have written lots of little languages that dont require
it.
Dave K
---------------------------------------------------------------
Everything Is Deeply Intertwingled. (Ted Nelson, Computer Lib)
dkirby@ <-figure this out, spambots!-> Dave.Kirby@
bigfoot. My opinions are my own, psygnosis.
com but I'm willing to share. co.uk
Dave Kirby wrote:
> grammatical structure.
>
> I would agree with your analysis up to a point, but there _are_
> full-blown languages that use methods similar to the one posted. The
> obvious one that springs to mind is forth, whose compiler/interpreter
> code can be summarised as:
This is fair. A stack-based scripting language (like Forth) does not need a
complex grammar parser, but only because the implicit grammar is so simple.
The original post, however, did not describe a stack language. It described
a system where you take the first token and invoke a callback, sending all
the other tokens to the callback.
This is not a completely useless notion. For Quake-style consoles, it works
great. But it doesn't qualify as a stack lanaguage because the callbacks
are supposed to know for themselves how to parse the incoming stream. In a
stack language, parameters are communicated via the stack so execution units
don't have to do any parsing for themselves.
--
Jason Shankel,
Maxis, Inc.
s h a n k e l
at
>The original post, however, did not describe a stack language. It
described
>a system where you take the first token and invoke a callback, sending all
>the other tokens to the callback.
True, that's why I also added a LOGO source, to clarify && expand a little
on the concept. It included a stack , as well as a (simple) callback
example.
The parser is indeed a stripped down version for a FORTH like language.
Have another look! :-)
I hope this clears up a couple of things,