>But then, wasn't Woz himself responsible for INTEGER BASIC? If so, there was
>probably some other fantastic optimisation he was able to perform by doing
>things this strange way, saving 5 clock cycles somewhere or something.
><grin>
>
>
>Michael
Yep, it was Woz. And see my previous post -- I think that a clever
optimization is exactly what this was.
--
Matthew T. Russotto russ...@pond.com
"Extremism in defense of liberty is no vice, and moderation in pursuit
of justice is no virtue."
Which is why in Applesoft, it was significantly faster to use
variables instead of constants.
$Bx is the Apple ASCII value for a digit. Allowing all the $Bx codes
probably shortens the tokenization code by a few instructions, because you
can just dump the first byte of the digit as the token.
Sure; my dad and I typed in bunches of programs from SoftSide.
> Anyway, I digress...
>
> I've sucessfully written a program to take an APPLESOFT program and
> de-tokenize it back into readable text, and it works an absolute treat. But
> INTEGER BASIC programs are proving a little bit more difficult, and I was
> wondering if somebody out there could shed some light on the process it
> uses.
>
> So far I have the structure being like this:
>
> 1 Byte: Length of Line
> 2 Bytes: Line Number (Lo/Hi Order)
> ? Bytes Tokenized Program
> Last Byte: 01 (End of line token)
>
> So, pulling apart some code I get:
>
> 105 PRINT "[CTRL-D]BLOAD BOWLING.OBJ"
> Hex: 19 69 00 61 28 84 C2 CC CF C1 C4 A0 C2 CF D7 CC C9 CE C7 AE CD C2 CA 29
> 01
>
> 19 = Line is 25 Bytes long
> 69 00 = Line 105
> 61 = PRINT token
> 28 = Quote Token
> 84 = Ctrl-D Character
> C2 CC CF C1 C4 A0 C2 CF D7 CC C9 CE C7 AE CD C2 CA = ASCII String (Hi bit
> Set)
> 29 = Quote Token (Different for closing quote... interesting.)
> 01 = End of Line
>
> That's not too bad and is quite similar to Applesoft except that things like
> quotes are tokenized and plain text has the high bit set. But once numbers
> start appearing in the code, things get really messy. INTEGER appears to
> encode all numbers too, whereas APPLESOFT just has them as plain text. So we
> get:
>
> 108 LOMEM: 5000
> Hex: 08 6C 00 11 B5 88 13 01
>
> 08 = Line is 8 bytes long
> 6C 00 = Line 108
> 11 = LOMEM Token
> B5 = Colon Token?? Or pointer that the next bytes are a number?
> 88 13 = 5000 (Stored in Lo/Hi Order)
> 01 = End of line
>
> 110 POKE 808,0 : POKE 809,12
> Hex: 15 6E 00 64 B8 28 03 65 B0 00 00 03 64 B8 29 03 65 B1 0C 00 01
>
> 15 = Line is 21 bytes long
> 6E 00 = Line 110
> 64 = POKE Token
> B8 = ??
> 28 03 = 808 (Lo/Hi)
> 65 = Comma token?
> B0 = ??
> 00 00 = 0 (Lo,Hi)
> 03 = Colon Token
> 64 = POKE Token
> B8 = ??
> 29 03 = 809 (Lo/Hi)
> 65 = Comma token
> B1 = ??
> 0C 00 = 12 (Lo/Hi)
> 01 = End of Line
>
> Particularly confusing in this case is that B0 appears after the comma token
> in the first poke, but B1 appears after the comma token in the second poke
> statement. It would appear that the B? character matches the first digit of
> the number that follows it, but that seems a bit weird to me, and certainly
> isn't an infallible coding.
>
Looks like your idea about the $Bx code is right. My guess is that it's a
way to speed up references by number. A code scan for a value can avoid doing a
hex-->decimal conversion for all entries except those with the right first
digit.
> Can anyone help? (And just a list of tokens would be helpful!)
>
....
Don't think I've ever come across a token listing for INT BASIC. Maybe
someone else can direct you to one.
Rubywand
I'm currently writing a utility to manage my Apple ][ disk images. As part
of this I'm going through a lot of my old personal disks that I created when
I was first learning computers in the early 80s, including a lot of BASIC
programs I typed in from magazines and so on. (Remember when you used to be
able to do that? Computer Magazines came with "Special 16 page pull-out
program supplements", not DVDs full of code ready to go.)
Anyway, I digress...
Can anyone help? (And just a list of tokens would be helpful!)
Regards,
Michael
Thanks Paul. Your code was very useful, and I've manged to get mine almost
working by decoding what yours does. But it still fails sometimes, as does
your own FID utility! The examples I've found are:
1700 REM
1710 T=T+H : IF NOT ST AND NOT SP THEN 1720 : GOSUB 1780 : T=T+H : IF NOT SP
THE
N T=T+H1
1720 SP=0 : ST=SV : SV=0 : IF NOT SET(L) THEN 1750
Your code (and mine too now!) misinterprets the above as:
1700 REM
1710 T=T+H : IF NOT ST AND NOT SP THEN 1720 : GOSUB 1780 : T=T+H : IF NOT SP
THE
N T=T+H9217-11514P=0 : ST=SV : SV=0 : IF NOT SET(L) THEN 1750
(Note line 1720 has got merged with 1720, which is really odd, because the
EOL check should fix that, I think, but anyway...)
Another example:
2020 X0=133*L : Y0=100 : S=-2 : X1= RND(200)+150)*(1-L)
: FOR I=1 TO 500 : NEXT I : X3=0
Becomes:
2020 X-20111^EHIMEM:*L : Y-20111 POKE HIMEM: : S=-2 : X14449
RND(200)+150)*(1-L)
: FOR I=1 TO 500 : NEXT I : X-20367 HIMEM:HIMEM:
So, obviously it's Variable names with trailing digits which is the problem.
(And is itoken[0] really "HIMEM:" as well as itoken[10]? I can understand
other tokens appearing multiple times, as it would appear to be for
different cases/usages of the command, but this doesn't make sense for
HIMEM.)
Anyway, the fix is easy (I think). You need to add another check to the
code, that's all. You need to add a "InVar" boolean flag to indicate that a
variable name is being "constrcuted". InVar would get set whenever a
AlphaNum character is encountered that isn't part of a REM or a String; and
it would get reset as soon as a Token is encountered. Finally, make InVar
another exception to the "convert the following two bytes to a number"
routine.
I still think that it is a very weird way of encoding number too! The speed
advantage of storing the number itself, rather than the ASCII representation
makes sense, but why not have a "Little Endian Number follows me" token and
leave it at that?
My OSI Superboard has an 8KB version of Microsoft's 6502 BASIC which is
so old it even works on the oldest 6502 CPU which did not have a ROL
(or it may have been ROR) instruction. It has many similarities to Applesoft
but some differences as well:
AND OR and NOT are binary operators rather than booleans
floating point is 4 byte instead of 5 byte
--
David Wilson School of IT & CS, Uni of Wollongong, Australia