How they do it? (BASIC interpreters)

Tuan Huynh

unread,

Sep 24, 1993, 5:21:52 AM9/24/93

to

Just wondering. How did 8-bit computers manage to have a fully running
Basic interpreter in as little as 10 Kb? More specifically, how did they
handle memory management of dynamic structures like strings? How about
parsing and interpretation?

--

_____________________________________________________________________________
Tuan Huynh McGill University, Class of '93
ice...@eskimo.com StarRunner - Space Junkie

George Burgyan

unread,

Sep 24, 1993, 12:53:32 PM9/24/93

to

In a previous article, ice...@eskimo.com (Tuan Huynh) says:

>Just wondering. How did 8-bit computers manage to have a fully running
>Basic interpreter in as little as 10 Kb? More specifically, how did they
>handle memory management of dynamic structures like strings? How about
>parsing and interpretation?

Well, BASIC is a relatively small language, and was designed to be small.
I know that the Commodore 64 (I didn't use that machine for about 9 years,
so the following man be just a bit wrong :) The program was stored in a
tokenized fasion starting at 2048 (?) and the variables were stored
starting at 32767 and growing down towards the program. If the variables
actually got down to the program, it would do a garbage collect. The
pointers to the variables I believe were stored following the program and
growing up in memory. Also the interpreter was stored in 1 8Kx8 ROM
starting at 32768.

|@@| <-- 32767
|@@| @ = Strings and arrays
|@@| % = Pointers to the strings, arrays, and integers
| | # = Program code
|%%| = Free space
|##|
|##| <-- 2048

GB! <-- Suprised that he remembers as much as he does. :)
--
/~~~~ bc...@po.cwru.edu The proof that this messgage is correct
/ ___ __ __ __ ___ __ is too long for this sig
/ / /__) / ) /__) / _ /__) .-------Vector Software--------.
/____/_(____(__/_/ \_/__/_(____ Burgyan | DOS, Windows Custom Software |

Thorbjoern Ravn Andersen

unread,

Sep 24, 1993, 1:40:26 PM9/24/93

to

bc...@cleveland.Freenet.Edu (George Burgyan) writes:

>In a previous article, ice...@eskimo.com (Tuan Huynh) says:

>>Just wondering. How did 8-bit computers manage to have a fully running
>>Basic interpreter in as little as 10 Kb? More specifically, how did they
>>handle memory management of dynamic structures like strings? How about
>>parsing and interpretation?

>Well, BASIC is a relatively small language, and was designed to be small.
>I know that the Commodore 64 (I didn't use that machine for about 9 years,
>so the following man be just a bit wrong :) The program was stored in a
>tokenized fasion starting at 2048 (?) and the variables were stored
>starting at 32767 and growing down towards the program. If the variables
>actually got down to the program, it would do a garbage collect. The
>pointers to the variables I believe were stored following the program and
>growing up in memory. Also the interpreter was stored in 1 8Kx8 ROM
>starting at 32768.

Just as amazing was the 8-bit versions of Turbo Pascal. With 56 Kb free
they managed to have a full compiler and editor, the source code for a 500
line program *and* the compiled code in memory with room to spare.

Somehow I am not that impressed with DOS software which bangs it head
agains the 640 Kb limit (when using the trusty PS/2-30).

As a curiosum -- the way you debugged your program was by running *the
compiler* under an external debugger and tricking your program into letting
the ^C keypress-check generate a debug session instead.

Ah -- those were the days. "But, will the kids today believe it?"

Regards,
--
Thorbjørn Ravn Andersen ...Our three major weapons are "who", "ps -aux",
ra...@imada.ou.dk and "kill -9" ... and an almost fanatical devotion
to "fastboot".

Ian Collier

unread,

Sep 26, 1993, 8:41:10 AM9/26/93

to

In article <27v8mc$6...@usenet.INS.CWRU.Edu>, bc...@cleveland.Freenet.Edu (George Burgyan) wrote:
>In a previous article, ice...@eskimo.com (Tuan Huynh) says:
>>Just wondering. How did 8-bit computers manage to have a fully running
>>Basic interpreter in as little as 10 Kb? More specifically, how did they
>>handle memory management of dynamic structures like strings? How about
>>parsing and interpretation?

>Well, BASIC is a relatively small language, and was designed to be small.

[description of C64 omitted]

I know quite a lot about the Sinclair ZX Spectrum (Z80-based computer)
thanks to the book "The Complete Spectrum ROM Disassembly" by Logan/O'Hara
(I wonder whether it's still available!). Its interpreter fits into about
15K of ROM, leaving 48K of address space for the RAM (its predecessor, the
ZX81, managed a BASIC interpreter in only 8K, and if I remember correctly
the first Sinclair computer, the ZX80, had only 4K of ROM initially. The
Spectrum's successors, the 128K, +2 and +3, had a simple memory manager
which paged in banks of 16K into the area C000H-FFFFH upon instructions
from I/O ports. The +3 has three extra 16K ROMs - one for DOS, one for
the "editor", and one for extra bits of BASIC which are not in the original
Spectrum. But I digress...).

Memory map of the spectrum:

The system variables area contains information used by the interpreter
itself, and includes things like PROG (the address of the program),
RAMTOP (the highest address used by BASIC) and UDG (the address of the
user defined graphics) listed above. The "stack" area between SP (the
Z80 stack pointer) and RAMTOP contains not only the machine stack but
also the program stack; whenever a GOSUB instruction is executed, the
line and statement number of the instruction is pushed on to the machine
stack where it stays until a RETURN instruction is executed.

A more detailed map of the area between PROG and SP follows.

STKBOT STKEND SP
80H | Calculator stack | free |

(this is from memory; the end markers 0DH and 80H may be slightly incorrect)

I can now tell you about memory management. All BASIC's variables are
kept in the area between VARS and ELINE. Each one starts with a byte
which indicates its name and type (or the first character of a name, if it
is a numeric variable) and is followed by the rest of its details. They are
stored in the order in which they were created. When a variable is named in
the program, it is searched for sequentially by examining and skipping past
each variable in the variables area.

The ROM contains two routines, MAKE_ROOM and RECLAIM. MAKE_ROOM takes
as parameters an address lying below STKEND and a number. It moves all
data between the given address and STKEND forward and adjusts all the
system variables mentioned in the above map (plus a few more) accordingly,
thus creating a gap of the given size in the memory map. So, if you
want to increase the length of a string or add a line to the program,
the interpreter calls MAKE_ROOM to make room for it. RECLAIM is the
opposite; it moves down all data above the given address and adjusts
the system variables in order to close a gap which was created by, for
example, deleting a line in the program or making a string shorter.

Arithmetic (including string arithmetic) is done on the calculator stack.
"1+2*3-4" is interpreted by stacking "1", "2" and "3", then multiplying,
adding, stacking "4" and subtracting. Each number is stored in five bytes
either as a floating point number or as an integer with sign (and two spare
bytes). A string is stored on the stack as an address and a length (and
one spare byte). The string itself is stored in the workspace, unless it
already exists (for instance a literal string in the program).

The program is stored tokenised (which is exactly the way you type it in)
with a few extra controls (line lengths, etc). The interpreter works by
fetching the first word in each statement (e.g. IF) and looking it up in
a table. The table will say, for instance, "collect a number, check that
THEN is present, then jump to address X". The routine at address X will
test the number which has just been collected, and either jump back to the
interpreter (if it was true) or skip to the next program line (if it was
false).

The interpreter is exeuted twice on each line. At the time when you type
the line in, it is run in "syntax mode", where it does a trial run of the
line and inserts a hidden control after each literal number. If this
trial run gives a syntax error, then the line is returned to the user
with a syntax marker indicating the position of the error. Otherwise it
can be added to the program (if it had a line number) or executed. When
the line comes to be executed, the interpreter is run in "runtime" mode,
where it actually does what the line says. A syntax error is unlikely to
occur at run time.

Ian Collier
Ian.C...@prg.ox.ac.uk | i...@ecs.ox.ac.uk

Gregory N. Bond

unread,

Sep 27, 1993, 12:33:08 AM9/27/93

to

In article <CDupp...@eskimo.com> ice...@eskimo.com (Tuan Huynh) writes:

Just wondering. How did 8-bit computers manage to have a fully running
Basic interpreter in as little as 10 Kb?

10k? Luxury!!!

The original TRS-80 had 4k ROM (with integer-only BASIC interpreter)
and 4k RAM. And the RAM expansion (8 16k x 1 chips) was several
hundred dollars.

Greg, Old Fart.
--
Gregory Bond <g...@bby.com.au> Burdett Buckeridge & Young Ltd Melbourne Australia
I will not do it as a hack I will not do it for my friends
I will not do it on a Mac I will not write for Uncle Sam
I will not do it on weekends I won't do ADA, Sam-I-Am

Michael F Gordon

unread,

Sep 27, 1993, 4:16:47 AM9/27/93

to

In <1993Sep26....@troll.comlab.ox.ac.uk> i...@comlab.ox.ac.uk (Ian Collier) writes:
>In article <27v8mc$6...@usenet.INS.CWRU.Edu>, bc...@cleveland.Freenet.Edu (George Burgyan) wrote:
>>In a previous article, ice...@eskimo.com (Tuan Huynh) says:
>>>Just wondering. How did 8-bit computers manage to have a fully running
>>>Basic interpreter in as little as 10 Kb? More specifically, how did they
>>>handle memory management of dynamic structures like strings? How about
>>>parsing and interpretation?

>>Well, BASIC is a relatively small language, and was designed to be small.
>[description of C64 omitted]

>I know quite a lot about the Sinclair ZX Spectrum (Z80-based computer)
>thanks to the book "The Complete Spectrum ROM Disassembly" by Logan/O'Hara

[ details of Spectrum BASIC internals deleted ]

If they'd had the space to add things like named subroutines, a few
more looping constructs etc. it would have held it's own against most
of the more modern BASICs. My favourite feature of Spectrum BASIC
was the ability to evaluate strings - i.e. you could write something
like

10 LET X=0.8
20 LET Y=1.6
30 LET A$="SIN(X)+Y/(2*COS(X))"
40 LET Z=EVAL A$

and Z would end up with the correct value. It made writing programs
that read a function and plotted a graph of it trivial.

--
Michael Gordon - m...@ee.ed.ac.uk
Computing Officer, EE Dept, Edinburgh University
It's not an optical illusion, it just looks like one

Leonard Erickson

unread,

Sep 27, 1993, 3:59:28 AM9/27/93

to

ice...@eskimo.com (Tuan Huynh) writes:

>Just wondering. How did 8-bit computers manage to have a fully running
>Basic interpreter in as little as 10 Kb? More specifically, how did they
>handle memory management of dynamic structures like strings? How about
>parsing and interpretation?

Dig up some old copies (early to mid 70s) of Dr. Dobbs. They published
*source* code to Tiny BASICs that ran in as little as *one* K.

Most of the BASICs on small micros handle strings by dedicating a portion
of the free RAM to string storage, and keeping a set of pointers. Usually
the string had the size and sttarting address stored with the rest of the
variables, and the contents stored in "string space". There were pointers
to the start of string space and the first unused address in string space.

In Microsoft BASICs, when a new value was assigned to a string, it was
stored in the "free" section of string space and the pointer was moved
up. When there wasn't enough room for the new value, a "garbage collection"
routine was called to unfragment string space. This could take a *long*
time (20 minute to pack 32k of heavily used string space on a 2MHZ Z-80).

Programmers used to call the FRE(A$) function (which reported the free
string space) as a means of *forcing* garbage collection to occur when
*they* wanted it to.

--
Leonard Erickson leo...@qiclab.scn.rain.com
FIDO: 1:105/51 Leonard....@f51.n105.z1.fidonet.org (preferred)

Torben AEgidius Mogensen

unread,

Sep 27, 1993, 10:36:27 AM9/27/93

to

ice...@eskimo.com (Tuan Huynh) writes:

>Just wondering. How did 8-bit computers manage to have a fully running
>Basic interpreter in as little as 10 Kb? More specifically, how did they
>handle memory management of dynamic structures like strings? How about
>parsing and interpretation?

Parsing:

Basically key-word based: you recognize a statement by it's initial
keyword. Many interpreters convert keywords to one-byte tokens when
you enter a program line.Structures that extended beyond a single
statement were not parsed as such, but there was a dynamic count of
the nesting of FOR-NEXT constructs etc. (see below). Some
interpreters would convert an occurence of a keyword to a token,
even if it was inside a variable name, e.g. parsing "DITTO" as "DI"
and the key-word "TO". Some interpreters ran through the program
text when you typed "RUN" and computed the addresses of variables,
the destination of GOTOs and paired the NEXTs with FORs etc. This
allowed for faster execution.

Interpretation:

Also key-word based. Program lines were held in a linked list to
speed up GOTO's. Some interpreters even cached the addresses of
program lines after the first GOTO to them. On a FOR statemet, the
address of the statement would be pushed onto a FOR-stack. This
would be examined when a NEXT was reached, which would then examine
the FOR statement to figure out which variable to increment and by
how much (and when to stop). Sometimes more information was pushed
at the FOR statement to speed up the processing. In many BASICs, the
interpreter would not test the stop condition of a FOR-loop when
entering it, as it had no way of knowing which NEXT statement would
match.

Strings:

Microsoft Basic on Commodore computers had a string space after the
program. When you executed a string-expression, the new string would
be build at the end of this space, and at assignment to string
variable, a pointer contained in the string variable would be
updated to the new address (and the length would be updated too).
When the string area was full, a garbage collection would be done.
This would move the "live" strings to one end of the string area,
allowing the rest to be reused. Other Basics required you to
declare the maximum size of each string (or else default to some
specific value). Thus no GC was required.

Microsoft Basic was contained in an 8KB ROM which also included the
miniscule OS (keyboard, screen and tape handling).

Torben Mogensen (tor...@diku.dk)

Paul Johnson

unread,

Sep 28, 1993, 4:24:03 AM9/28/93

to

Torben AEgidius Mogensen (tor...@diku.dk) wrote:
> ice...@eskimo.com (Tuan Huynh) writes:

> >Just wondering. How did 8-bit computers manage to have a fully running
> >Basic interpreter in as little as 10 Kb? More specifically, how did they
> >handle memory management of dynamic structures like strings? How about
> >parsing and interpretation?

[Much good stuff in reply]

I cut my teeth on Commodore Basic, then on a PET 3032 (Personal
Electronic Transactor, in case you wondered). A massive 32K of RAM
and a non-ASCII character set used on the screen (see PETSCII in the
Jargon File). It fitted the interpreter into 8K of ROM.

This beast used a 1MHz 6502. The 6502 had two special "pages" (256
byte chunks) of memory. Page zero could be accessed by special
instructions, including an indirect-index used for array accesses and
table lookups. Page 1 was the stack (yes, 256 bytes. Whats
recursion?)

About 8 bytes of this valuable zero page space, starting at $70 as I
recall, were dedicated to a machine code routine which got the next
character in the program. The ROM basic installed this as part of
bootup. Why? Well you could modify this routine to intercept the
next character, pass it to your special routine, and thus introduce
new keywords into Basic.

Then there was the BBC micro. Wow. Now there was a machine. 16K OS
plus 16K Basic, and a paged ROM scheme so that you could plug in other
languages, word processors or whatever. The Basic was written for
speed, and was blindingly fast. (Having a 2MHz 6502 helped of
course). It could do a full screen Mandelbrot in under 20 minutes!
At 320x256 pixels no less!

Being so fast, it would not afford time to do garbage collection on
strings. This also avoided the double-indirection cost associated
with copying collectors. So you learned very soon that you had to
create strings as big as they were going to get.

I worked for 6 months at a place called Five Ways Software (do they
still exist?) in Birmingham, trying to stuff educational programs into
BBC micros. The 32K limit was really restrictive. We used to spend
hours trying to bum a few extra bytes out of the code. We knew the
procedure and subroutine stack protocols, and when to use them. We
knew the memory map and the storage schemes for integer variables.
Ahh, those were the days (NOT!).

Paul.
--
Paul Johnson (p...@gec-mrc.co.uk). | Tel: +44 245 473331 ext 3245
--------------------------------------------+----------------------------------
Yes, I know my Reply-to address is reversed.| GEC-Marconi Research is not
Take it up with my sysadmin. | responsible for my opinions

Ian Collier

unread,

Oct 3, 1993, 10:20:29 AM10/3/93

to

This is of little consequence, but what the heck! :-)

In article <1993Sep27.1...@odin.diku.dk>, tor...@diku.dk (Torben AEgidius Mogensen) wrote:
>Interpretation:

> On a FOR statemet, the
> address of the statement would be pushed onto a FOR-stack. This
> would be examined when a NEXT was reached, which would then examine
> the FOR statement to figure out which variable to increment and by
> how much (and when to stop). Sometimes more information was pushed
> at the FOR statement to speed up the processing. In many BASICs, the
> interpreter would not test the stop condition of a FOR-loop when
> entering it, as it had no way of knowing which NEXT statement would
> match.

Spectrum BASIC stores the line and statement number of the FOR instruction
as well as the increment and limit values within the control variable, and
it insists on having the control variable named on the NEXT instruction.
Thus it doesn't need a FOR stack, and it can easily find out which NEXT
to jump to if the loop has to be executed zero times.

> Other Basics required you to
> declare the maximum size of each string (or else default to some
> specific value). Thus no GC was required.

As I've mentioned, Spectrum BASIC does implicit GC every time something
is altered, so no explicit GC is needed. Spectrum BASIC was written for
compactness, not speed!

Ian Collier
Ian.C...@prg.ox.ac.uk | i...@ecs.ox.ac.uk