Integer Basic Tokenization

Matthew Russotto

unread,

Mar 16, 2001, 11:59:39 AM3/16/01

to

In article <98t1cf$r69$1...@news5.svr.pol.co.uk>,
Thug <th...@optusnet.com.au> wrote:

>But then, wasn't Woz himself responsible for INTEGER BASIC? If so, there was
>probably some other fantastic optimisation he was able to perform by doing
>things this strange way, saving 5 clock cycles somewhere or something.
><grin>
>
>
>Michael

Yep, it was Woz. And see my previous post -- I think that a clever
optimization is exactly what this was.

--
Matthew T. Russotto russ...@pond.com
"Extremism in defense of liberty is no vice, and moderation in pursuit
of justice is no virtue."

Matthew Russotto

unread,

Mar 16, 2001, 11:51:39 AM3/16/01

to

In article <98sef0$5tq$1...@merope.saaf.se>, Paul Schlyter <pau...@saaf.se> wrote:
>
>Correct: Integer Basic eocnides all integer constants as 16-bit
>binaries, prefixed with a byte B0-B9. Interpretation gets quicker if
>integer constants need not be converted from ASCII each and every
>time they are encountered in the code.

Which is why in Applesoft, it was significantly faster to use
variables instead of constants.

Matthew Russotto

unread,

Mar 16, 2001, 11:56:57 AM3/16/01

to

In article <3AB1D3FB...@swbell.net>,
Rubywand <ruby...@swbell.net> wrote:
>
> Looks like your idea about the $Bx code is right. My guess is that it's a
>way to speed up references by number. A code scan for a value can avoid doing a
>hex-->decimal conversion for all entries except those with the right first
>digit.

$Bx is the Apple ASCII value for a digit. Allowing all the $Bx codes
probably shortens the tokenization code by a few instructions, because you
can just dump the first byte of the digit as the token.

Rubywand

unread,

Mar 16, 2001, 3:51:07 AM3/16/01

to

Thug writes ...
>
> G'day,
>
> I'm currently writing a utility to manage my Apple ][ disk images. As part
> of this I'm going through a lot of my old personal disks that I created when
> I was first learning computers in the early 80s, including a lot of BASIC
> programs I typed in from magazines and so on. (Remember when you used to be
> able to do that? Computer Magazines came with "Special 16 page pull-out
> program supplements", not DVDs full of code ready to go.)
>

Sure; my dad and I typed in bunches of programs from SoftSide.

> Anyway, I digress...
>
> I've sucessfully written a program to take an APPLESOFT program and
> de-tokenize it back into readable text, and it works an absolute treat. But
> INTEGER BASIC programs are proving a little bit more difficult, and I was
> wondering if somebody out there could shed some light on the process it
> uses.
>
> So far I have the structure being like this:
>
> 1 Byte: Length of Line
> 2 Bytes: Line Number (Lo/Hi Order)
> ? Bytes Tokenized Program
> Last Byte: 01 (End of line token)
>
> So, pulling apart some code I get:
>
> 105 PRINT "[CTRL-D]BLOAD BOWLING.OBJ"
> Hex: 19 69 00 61 28 84 C2 CC CF C1 C4 A0 C2 CF D7 CC C9 CE C7 AE CD C2 CA 29
> 01
>
> 19 = Line is 25 Bytes long
> 69 00 = Line 105
> 61 = PRINT token
> 28 = Quote Token
> 84 = Ctrl-D Character
> C2 CC CF C1 C4 A0 C2 CF D7 CC C9 CE C7 AE CD C2 CA = ASCII String (Hi bit
> Set)
> 29 = Quote Token (Different for closing quote... interesting.)
> 01 = End of Line
>
> That's not too bad and is quite similar to Applesoft except that things like
> quotes are tokenized and plain text has the high bit set. But once numbers
> start appearing in the code, things get really messy. INTEGER appears to
> encode all numbers too, whereas APPLESOFT just has them as plain text. So we
> get:
>
> 108 LOMEM: 5000
> Hex: 08 6C 00 11 B5 88 13 01
>
> 08 = Line is 8 bytes long
> 6C 00 = Line 108
> 11 = LOMEM Token
> B5 = Colon Token?? Or pointer that the next bytes are a number?
> 88 13 = 5000 (Stored in Lo/Hi Order)
> 01 = End of line
>
> 110 POKE 808,0 : POKE 809,12
> Hex: 15 6E 00 64 B8 28 03 65 B0 00 00 03 64 B8 29 03 65 B1 0C 00 01
>
> 15 = Line is 21 bytes long
> 6E 00 = Line 110
> 64 = POKE Token
> B8 = ??
> 28 03 = 808 (Lo/Hi)
> 65 = Comma token?
> B0 = ??
> 00 00 = 0 (Lo,Hi)
> 03 = Colon Token
> 64 = POKE Token
> B8 = ??
> 29 03 = 809 (Lo/Hi)
> 65 = Comma token
> B1 = ??
> 0C 00 = 12 (Lo/Hi)
> 01 = End of Line
>
> Particularly confusing in this case is that B0 appears after the comma token
> in the first poke, but B1 appears after the comma token in the second poke
> statement. It would appear that the B? character matches the first digit of
> the number that follows it, but that seems a bit weird to me, and certainly
> isn't an infallible coding.
>

Looks like your idea about the $Bx code is right. My guess is that it's a
way to speed up references by number. A code scan for a value can avoid doing a
hex-->decimal conversion for all entries except those with the right first
digit.

> Can anyone help? (And just a list of tokens would be helpful!)
>
....

Don't think I've ever come across a token listing for INT BASIC. Maybe
someone else can direct you to one.

Rubywand

Thug

unread,

Mar 15, 2001, 4:43:38 PM3/15/01

to

G'day,

I'm currently writing a utility to manage my Apple ][ disk images. As part
of this I'm going through a lot of my old personal disks that I created when
I was first learning computers in the early 80s, including a lot of BASIC
programs I typed in from magazines and so on. (Remember when you used to be
able to do that? Computer Magazines came with "Special 16 page pull-out
program supplements", not DVDs full of code ready to go.)

Anyway, I digress...

Can anyone help? (And just a list of tokens would be helpful!)

Regards,

Michael

Thug

unread,

Mar 16, 2001, 7:31:24 AM3/16/01

to

"Paul Schlyter" <pau...@saaf.se> wrote in message
news:98sef0$5tq$1...@merope.saaf.se...
>
> Visit my apple 2 page at:
>
> http://hotel04.ausys.se/pausch/apple2
>
> and download my utility FID, which comes with C source. It contains
> de-tokenizers for Applesoft Basic, Integer Basic, and S-C Assembler
> source files (the latter are stored as "I" type files too!). That
> code should help you figure out how INteger Basic programs are
> tokenized.

Thanks Paul. Your code was very useful, and I've manged to get mine almost
working by decoding what yours does. But it still fails sometimes, as does
your own FID utility! The examples I've found are:

1700 REM
1710 T=T+H : IF NOT ST AND NOT SP THEN 1720 : GOSUB 1780 : T=T+H : IF NOT SP
THE
N T=T+H1
1720 SP=0 : ST=SV : SV=0 : IF NOT SET(L) THEN 1750

Your code (and mine too now!) misinterprets the above as:

1700 REM
1710 T=T+H : IF NOT ST AND NOT SP THEN 1720 : GOSUB 1780 : T=T+H : IF NOT SP
THE
N T=T+H9217-11514P=0 : ST=SV : SV=0 : IF NOT SET(L) THEN 1750

(Note line 1720 has got merged with 1720, which is really odd, because the
EOL check should fix that, I think, but anyway...)

Another example:

2020 X0=133*L : Y0=100 : S=-2 : X1= RND(200)+150)*(1-L)
: FOR I=1 TO 500 : NEXT I : X3=0

Becomes:

2020 X-20111^EHIMEM:*L : Y-20111 POKE HIMEM: : S=-2 : X14449
RND(200)+150)*(1-L)
: FOR I=1 TO 500 : NEXT I : X-20367 HIMEM:HIMEM:

So, obviously it's Variable names with trailing digits which is the problem.
(And is itoken[0] really "HIMEM:" as well as itoken[10]? I can understand
other tokens appearing multiple times, as it would appear to be for
different cases/usages of the command, but this doesn't make sense for
HIMEM.)

Anyway, the fix is easy (I think). You need to add another check to the
code, that's all. You need to add a "InVar" boolean flag to indicate that a
variable name is being "constrcuted". InVar would get set whenever a
AlphaNum character is encountered that isn't part of a REM or a String; and
it would get reset as soon as a Token is encountered. Finally, make InVar
another exception to the "convert the following two bytes to a number"
routine.

I still think that it is a very weird way of encoding number too! The speed
advantage of storing the number itself, rather than the ASCII representation
makes sense, but why not have a "Little Endian Number follows me" token and
leave it at that?

Paul Schlyter

unread,

Mar 16, 2001, 2:13:36 AM3/16/01

to

In article <98rh8a$ead$1...@news7.svr.pol.co.uk>,

Thug <th...@optusnet.com.au> wrote:

> I'm currently writing a utility to manage my Apple ][ disk images. As part
> of this I'm going through a lot of my old personal disks that I created when
> I was first learning computers in the early 80s, including a lot of BASIC
> programs I typed in from magazines and so on. (Remember when you used to be
> able to do that? Computer Magazines came with "Special 16 page pull-out
> program supplements", not DVDs full of code ready to go.)
>
> Anyway, I digress...
>
> I've sucessfully written a program to take an APPLESOFT program and
> de-tokenize it back into readable text, and it works an absolute treat. But
> INTEGER BASIC programs are proving a little bit more difficult, and I was
> wondering if somebody out there could shed some light on the process it
> uses.
>
> So far I have the structure being like this:
>
> 1 Byte: Length of Line
> 2 Bytes: Line Number (Lo/Hi Order)
> ? Bytes Tokenized Program
> Last Byte: 01 (End of line token)

Correct

> That's not too bad and is quite similar to Applesoft except that things like
> quotes are tokenized and plain text has the high bit set. But once numbers
> start appearing in the code, things get really messy. INTEGER appears to
> encode all numbers too, whereas APPLESOFT just has them as plain text.

Correct: Integer Basic eocnides all integer constants as 16-bit
binaries, prefixed with a byte B0-B9. Interpretation gets quicker if
integer constants need not be converted from ASCII each and every
time they are encountered in the code.

..............

> Can anyone help? (And just a list of tokens would be helpful!)

Visit my apple 2 page at:

http://hotel04.ausys.se/pausch/apple2

and download my utility FID, which comes with C source. It contains
de-tokenizers for Applesoft Basic, Integer Basic, and S-C Assembler
source files (the latter are stored as "I" type files too!). That
code should help you figure out how INteger Basic programs are
tokenized.

--
----------------------------------------------------------------
Paul Schlyter, Swedish Amateur Astronomer's Society (SAAF)
Grev Turegatan 40, S-114 38 Stockholm, SWEDEN
e-mail: pausch at saaf dot se or paul.schlyter at ausys dot se
WWW: http://hotel04.ausys.se/pausch http://welcome.to/pausch

Paul Schlyter

unread,

Mar 17, 2001, 6:50:34 AM3/17/01

to

In article <98t1cf$r69$1...@news5.svr.pol.co.uk>,
Thug <th...@optusnet.com.au> wrote:

One of those HIMEM:'s is probably never used.

What I did in order to examine the entire token table was to poke
various bytes in an Integer Basic program line, and then LIST it and
see what appeared. And if $00 is poked into the Integer Basic line
as a token of its own, it does get listed as HIMEM: - this might be
just an unintentional side effect since, as you correctly point out,
it doesn't make much sense to have two different HIMEM:'s.

Also: some of the tokens can be executed only as direct commands in
Integer Basic - trying to add them to an Integer Basic program just
yields a ***SYNTAX ERROR. Of course one can bypass this by poking
these token bytes directly into a suitable program line.

> Anyway, the fix is easy (I think). You need to add another check to the
> code, that's all. You need to add a "InVar" boolean flag to indicate that a
> variable name is being "constrcuted". InVar would get set whenever a
> AlphaNum character is encountered that isn't part of a REM or a String; and
> it would get reset as soon as a Token is encountered. Finally, make InVar
> another exception to the "convert the following two bytes to a number"
> routine.

You're absolutely right! Thanks for finding this bug for me - I'm
going to add a fix for that to FID very soon. I never did any extensive
testing of the Integer Basic de-tokenizer, and apparently never let it
list an Integer Basic program with embedded digits in the variable
names.

That bug would not have appeared if Woz had reserved a special
"Binary Integer Constant Follows" token, instead of just using ASCII
'0'-'9' with the hi bit set.

> I still think that it is a very weird way of encoding number too! The speed
> advantage of storing the number itself, rather than the ASCII representation
> makes sense, but why not have a "Little Endian Number follows me" token and
> leave it at that?
>
> But then, wasn't Woz himself responsible for INTEGER BASIC? If so, there was
> probably some other fantastic optimisation he was able to perform by doing
> things this strange way, saving 5 clock cycles somewhere or something.
> <grin>

Woz did indeed write Integer Basic. He had taken some classes in
compiler construction, and used that knowledge to design and
implement Integer Basic in a very short time with very modest
resources. Some rumors I've hard claimed he wrote it over a weekend;
that's obviously an exaggreation, but within a month or two he had
implemented most of it, using only a mini-assembler (and probably lots
of notes on paper). There has never existed any official assembler
source listing for Integer Basic, and if you disassemble and examine
the code, you'll see that it jumps here and there, as if it has
been patched a lot - which is what to be expected by code written
in that way.

Integer Basic was for that time one of the fastest Basic interpreters
available, probably pretty much due to it's semi-compiled nature. Just
too bad it never got floating point - Woz was on his way with floating
point too: the set of ROM's for Integer Basic and the Monitor had
some extra space, which was used for the mini-assembler, the Sweet-16
interpreter, and some little-known floating-point routines. But those
floating-point routines never got integrated with Woz's Basic, which
forever remained an Integer Basic.

Before Applesoft appeared, Integer Basic was simply called Apple Basic.

Paul Schlyter

unread,

Mar 17, 2001, 6:49:16 AM3/17/01

to

In article <3AB1D3FB...@swbell.net>,
Rubywand <ruby...@swbell.net> wrote:

> Don't think I've ever come across a token listing for INT BASIC.

Then let me present you to one. Note that $01 is used to represent
the end of an Integer Basic line. Also note that since this is C
source code, "\"" means a string containing only one quote (") char.

In Applesoft it's about equally simple to convert tokens to ASCII and
ASCII to tokens. But in Integer Basic it is much simpler to convert
tokens to ASCII than to convert ASCII to tokens, since several tokens
have the same ASCII representation. This will make tokenization much
more complex since which token e.g. a comma or a start parentheses
should be translated to depends on the context where it appears. One
simple example: in Applesoft, a plus always translates to the same
token, but in Integer Basic a uniary plus will become $35 in
tokenized form, while a binary plus will become $12 in tokenized
form. When tokenizing an ASCII string, one must therefore parse a
statement to determine whether a particular plus is a unary plus, a
binary plus, or a syntax error.

#define REM_TOKEN 0x5D
#define UNARY_PLUS 0x35
#define UNARY_MINUS 0x36
#define QUOTE_START 0x28
#define QUOTE_END 0x29
static char *itoken[128] =
{
/* $00-$0F */
"HIMEM:","<$01>", "_", " : ",
"LOAD", "SAVE", "CON", "RUN", /* Direct commands */
"RUN", "DEL", ",", "NEW",
"CLR", "AUTO", ",", "MAN",

/* $10-$1F */
"HIMEM:","LOMEM:","+", "-", /* Binary ops */
"*", "/", "=", "#",
">=", ">", "<=", "<>",
"<", "AND", "OR", "MOD",

/* $20-$2F */
"^", "+", "(", ",",
"THEN", "THEN", ",", ",",
"\"", "\"", "(", "!",
"!", "(", "PEEK", "RND",

/* $30-$3F */
"SGN", "ABS", "PDL", "RNDX",
"(", "+", "-", "NOT", /* Unary ops */
"(", "=", "#", "LEN(",
"ASC(", "SCRN(", ",", "(",

/* $40-$4F */
"$", "$", "(", ",",
",", ";", ";", ";",
",", ",", ",", "TEXT", /* Statements */
"GR", "CALL", "DIM", "DIM",

/* $50-$5F */
"TAB", "END", "INPUT", "INPUT",
"INPUT", "FOR", "=", "TO",
"STEP", "NEXT", ",", "RETURN",
"GOSUB", "REM", "LET", "GOTO",

/* $60-$6F */
"IF", "PRINT", "PRINT", "PRINT",
"POKE", ",", "COLOR=","PLOT",
",", "HLIN", ",", "AT",
"VLIN", ",", "AT", "VTAB",

/* $70-$7F */
"=", "=", ")", ")",
"LIST", ",", "LIST", "POP",
"NODSP", "DSP", "NOTRACE","DSP",
"DSP", "TRACE", "PR#", "IN#",
};

Paul Schlyter

unread,

Mar 17, 2001, 6:51:09 AM3/17/01

to

In article <vwrs6.20654$PH.17...@e3500-chi1.usenetserver.com>,

Matthew Russotto <russ...@wanda.pond.com> wrote:

> In article <98sef0$5tq$1...@merope.saaf.se>, Paul Schlyter <pau...@saaf.se> wrote:
>
>> Correct: Integer Basic eocnides all integer constants as 16-bit
>> binaries, prefixed with a byte B0-B9. Interpretation gets quicker if
>> integer constants need not be converted from ASCII each and every
>> time they are encountered in the code.
>
> Which is why in Applesoft, it was significantly faster to use
> variables instead of constants.

Another thing which added to the slowness of Applesoft Basic was
that it really was a floating-point only Basic in many respects.
E.g. the statement:

10 A% = 2

would parse the "2" as a floating-point number, and then convert it
to an integer before storing it in A% !!!!

Applesoft (as well as several other related versions, such as PET
Basic and Commodore 64 Basic) was really Microsoft's 6502 version of
its Basic interpreter version 2. Microsoft never upgraded the 6502
Basic interpreters any further, but the 8080 versions got released in
new versions up to version 5.(something). In version 5 of MBASIC,
all numeric constants were stored in binary format - integer as well
as floating-point constants, and the latter could appear in
single-precision as well as double-presicion verisons. MBASIC ver 5
also did a quite interesting thing with the GOTO <linenumber>
statement: at first when entered, <linenumber> was stored just as the
line number, in binary format of course. The first time this GOTO
executed, the program was scanned for the target line; when found,
the GOTO token was changed to another GOTO token, and <linenumber>
was changed to a binary offset from the beginning of the program.
Thus, on subsequent executions, the target line didn't need to be
searched. However the LIST command had to be intelligent enough to
convert a "GOTO2 <offset>" statement to a "GOTO <linenumber>"
statement. And any change to the program would force a scan of the
entire program, converting all <offset>'s back to <linenumber>'s.
GOSUB <linenumber> and RESUME <linenumber> were treated in a similar
way.

Paul Schlyter

unread,

Mar 17, 2001, 9:35:31 AM3/17/01

to

In article <98t1cf$r69$1...@news5.svr.pol.co.uk>,
Thug <th...@optusnet.com.au> wrote:

> "Paul Schlyter" <pau...@saaf.se> wrote in message
> news:98sef0$5tq$1...@merope.saaf.se...
>>
>> Visit my apple 2 page at:
>>
>> http://hotel04.ausys.se/pausch/apple2
>>
>> and download my utility FID, which comes with C source. It contains
>> de-tokenizers for Applesoft Basic, Integer Basic, and S-C Assembler
>> source files (the latter are stored as "I" type files too!). That
>> code should help you figure out how INteger Basic programs are
>> tokenized.

...........................

> Another example:
>
> 2020 X0=133*L : Y0=100 : S=-2 : X1= RND(200)+150)*(1-L)
> : FOR I=1 TO 500 : NEXT I : X3=0
>
> Becomes:
>
> 2020 X-20111^EHIMEM:*L : Y-20111 POKE HIMEM: : S=-2 : X14449
> RND(200)+150)*(1-L)
> : FOR I=1 TO 500 : NEXT I : X-20367 HIMEM:HIMEM:
>
> So, obviously it's Variable names with trailing digits which is the problem.

..............

> Anyway, the fix is easy (I think). You need to add another check to the
> code, that's all. You need to add a "InVar" boolean flag to indicate that a
> variable name is being "constrcuted". InVar would get set whenever a
> AlphaNum character is encountered that isn't part of a REM or a String; and
> it would get reset as soon as a Token is encountered. Finally, make InVar
> another exception to the "convert the following two bytes to a number"
> routine.

An updated version of FID, where this IntBasic listing bug is fixed,
is now available on my page above.

The old code already had a "lastAN" boolean flag, which remembered
whether the last token was an alphanumeric character or not -- although
it also was set if the last token was an ending quote or an ending
paranthesis - this was to help determining whether a leading space
should be inserted in front of the next token or not. I changed the
name of that boolean flag to "leadSP" instead, and now lastAN is
set only if the last token was an alphanumeric ASCII character;
it is NOT set if the last token was an integer constant.

All the needed modifications are in the dumpBufferAsIntBasicFile()
function, and the updated version appears below. The Integer
Basic tokens are assumed to reside where "data" points, and "len"
is supposed to contain the length of the Integer Basic file image.
The parameters "fname" and "f" are used to control where the
output appears and how it's supposed to be named (the dump ends
with a "SAVE <filename>" line, i.e. it can be transferred to the
Apple II as a text file and EXEC'ed into memory, or one can do an
IN#2 (assuming a serial card resides in slot 2) and then have some
other computer send it directly to that serial card.

uint = unsigned int
U8 = unsigned char
==========================================================================

int dumpBufferAsIntBasicFile( U8 *data, char *fname, uint len, FILE *f )
/*
* Integer Basic file format:
*
* <Length_of_file> (16-bit little endian)
* <Line>
* ......
* <Line>
*
* where <Line> is:
* 1 byte: Line length
* 2 bytes: Line number, binary little endian
* <token>
* <token>
* <token>
* .......
* <end-of-line token>
*
* <token> is one of:
* $12 - $7F: Tokens as listed below: 1 byte/token
* $80 - $FF: ASCII characters with high bit set
* $B0 - $B9: Integer constant, 3 bytes: $B0-$B9,
* followed by the integer value in
* 2-byte binary little-endian format
* (Note: a $B0-$B9 byte preceded by an
* alphanumeric ASCII(hi_bit_set) byte
* is not the start of an integer
* constant, but instead part of a
* variable name)
*
* <end-of-line token> is:
* $01: One byte having the value $01
* (Note: a $01 byte may also appear
* inside an integer constant)
*
* Note that the tokens $02 to $11 represent commands which
* can be executed as direct commands only -- any attempt to
* enter then into an Integer Basic program will be rejected
* as a syntax error. Therefore, no Integer Basic program
* which was entered through the Integer Basic interpreter
* will contain any of the tokens $02 to $11. The token $00
* appears to be unused and won't appear in Integer Basic
* programs either. However, $00 is used as an end-of-line
* marker in S-C Assembler source files, which also are of
* DOS file type "I".
*
* (note here a difference from Applesoft Basic, where there
* are no "direct mode only" commands - any Applesoft commands
* can be entered into an Applesoft program as well).
*
*/
{

"DSP", "TRACE", "PR#", "IN#",
};

U8 *data0 = data;
int alen = get16(data);
pause(22,f);
for( data+=2; *data && (data-data0 <= alen); )
{
int inREM = 0, inQUOTE = 0;
int lastAN = 0, leadSP = 0, lastTOK = 0;
unsigned int lineno;
unsigned int linelen = *data++;
lineno = get16(data), data += 2;
linelen = linelen;
fprintf( f, "%u ", lineno );
for( ; *data!=0x01; data++ )
{
leadSP = leadSP || lastAN;
if ( *data & 0x80 )
{
if ( !inREM && !inQUOTE && !lastAN && (*data >= 0xB0 && *data <= 0xB9) )
{
signed short integer = get16(data+1);
int leadspace = lastTOK && leadSP;
fprintf( f, leadspace ? " %d" : "%d", (int) integer );
data += 2;
leadSP = 1;
}
else
{
char c = *data & 0x7F;
int leadspace = !inREM && !inQUOTE &&
lastTOK && leadSP && isalnum(c);
if ( leadspace )
fprintf( f, " " );
if ( c >= 0x20 )
fprintf( f, "%c", c );
else
fprintf( f, "^%c", c+0x40 );
lastAN = isalnum(c);
}
lastTOK = 0;
}
else
{
char *tok = itoken[*data];
char lastchar = tok[strlen(tok)-1];
int leadspace = leadSP &&
( isalnum(tok[0]) ||
*data == UNARY_PLUS ||
*data == UNARY_MINUS ||
*data == QUOTE_START );
switch( *data )
{
case REM_TOKEN: inREM = 1; break;
case QUOTE_START: inQUOTE = 1; break;
case QUOTE_END: inQUOTE = 0; break;
default: break;
}
fprintf( f, leadspace ? " %s" : "%s", tok );
lastAN = 0;
leadSP = isalnum(lastchar) || lastchar == ')' || lastchar == '\"';
lastTOK = 1;
}
}
fprintf( f, "\n" ), data++;
if ( pause(0,f) < 0 )
goto exit;
}
len = len;
if ( f != stdout )
fprintf( f, "\nSAVE %s\n", fname );
exit:
return 0;
} /* dumpBufferAsIntBasicFile */

David Wilson

unread,

Mar 17, 2001, 10:40:13 PM3/17/01

to

pau...@saaf.se (Paul Schlyter) writes:
>Applesoft (as well as several other related versions, such as PET
>Basic and Commodore 64 Basic) was really Microsoft's 6502 version of
>its Basic interpreter version 2. Microsoft never upgraded the 6502
>Basic interpreters any further, but the 8080 versions got released in
>new versions up to version 5.(something).

My OSI Superboard has an 8KB version of Microsoft's 6502 BASIC which is
so old it even works on the oldest 6502 CPU which did not have a ROL
(or it may have been ROR) instruction. It has many similarities to Applesoft
but some differences as well:

AND OR and NOT are binary operators rather than booleans
floating point is 4 byte instead of 5 byte
--
David Wilson School of IT & CS, Uni of Wollongong, Australia