Adam Vardy's article "Extra Instructions Of The 65XX Series CPU" covers
this topic very well down to the ugliest details, but is unclear in two
points:
> SKB ***
> SKB stands for skip next byte.
> Opcodes: 80, 82, C2, E2, 04, 14, 34, 44, 54, 64, 74, D4, F4.
> Takes 2, 3, or 4 cycles to execute.
Is it documented anywhere which of these opcodes have which cycle counts?
> AXA ***
> This opcode stores the result of A AND X AND the high byte of the target
> address of the operand +1 in memory.
>
> Supported modes:
>
> AXA abcd,Y ;9F cd ab ;No. Cycles= 5
> AXA (ab),Y ;93 ab ; 6
>
> Example:
>
> AXA $7133,Y ;9F 33 71
>
> Equivalent instructions:
>
> STX $02
> PHA
> AND $02
> AND #$72
> STA $7133,Y
> PLA
> LDX $02
This is clear enough for the abs-y (9F) case, but what is "the high byte
of the target address of the operand" in the indirect-y case (93)?
Any elucidations are highly appreciated! Including telling me where to
look and/or whom to ask.
(Please note that this article has been cross-posted to three groups).
--
Linards Ticmanis
> Is it documented anywhere which of these opcodes have which cycle
> counts?
Get AAY64
http://www.the-dreams.de
also look in here
http://oxyron.de/html/opcodes.html
--
-=[]=--- iAN CooG/HokutoForce ---=[]=-
Here are without any claim of validity excerpts from "6502 Undocumented
Opcodes v3.0" by Freddy Offenga:
===
"The timing values (clock cycles) from all the opcodes were compared
with
the values on the list by Adam Vardy. There were no differences.
The addressing modes for the "DOP" (double nop) and "TOP" instructions
were copied from Craig Taylor's list. The reason for this is that the
different addressing modes explain the differences in the timing
values. "
===
DOP (NOP) [SKB]
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
No operation (double NOP). The argument has no significance.
Status flags: -
Addressing |Mnemonics |Opc|Sz | n (cycles)
------------|-----------|---|---|---
Zero Page |DOP arg |$04| 2 | 3
Zero Page,X |DOP arg,X |$14| 2 | 4
Zero Page,X |DOP arg,X |$34| 2 | 4
Zero Page |DOP arg |$44| 2 | 3
Zero Page,X |DOP arg,X |$54| 2 | 4
Zero Page |DOP arg |$64| 2 | 3
Zero Page,X |DOP arg,X |$74| 2 | 4
Immediate |DOP #arg |$80| 2 | 2
Immediate |DOP #arg |$82| 2 | 2
Immediate |DOP #arg |$89| 2 | 2
Immediate |DOP #arg |$C2| 2 | 2
Zero Page,X |DOP arg,X |$D4| 2 | 4
Immediate |DOP #arg |$E2| 2 | 2
Zero Page,X |DOP arg,X |$F4| 2 | 4
===
bye
Marcus
Without checking, they should be the same as the corresponding LDA
opcode with the relevant addressing mode.
(From Marcus)
> There are some discrepancies between the different info files floating around the net.
Yes, I've been collecting these, and collating the information for some
time.
> v3.0
BTW, there's a 3.1 version of this doc floating around.
(From Freddy!)
> DOP (NOP) [SKB]
(Also TOP)
These are badly named. I prefer NOP or RDM (read memory) or perhaps ADR
(address).
(From Freddy!)
> The argument has no significance.
Bzzt! Next contestant please. This puts an address on the bus. This
means you can touch softswitches and I/O locations with these commands.
(Hence my preference for a different mnemonic.) Yes, I know the above
commands are 2 bytes, but the C64 has its I/O regs at $0000, $0001, and
there are a handful of 3-byte NOP $XXXX opcodes which can be used for
this on the Apple II.
Cheers,
Nick.
Since there were several versions of the 6502, isn't it likely that
some had *differing* illegal ops and results?
-michael
Music synthesis for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/
"The wastebasket is our most important design
tool--and it is seriously underused."
> sicklittlemonkey wrote:
> > (From Linards)
> >
> >>Is it documented anywhere which of these opcodes have which cycle counts?
> >
> >
> > Without checking, they should be the same as the corresponding LDA
> > opcode with the relevant addressing mode.
> >
> > (From Marcus)
> >
> >>There are some discrepancies between the different info files floating
> >>around the net.
> >
> >
> > Yes, I've been collecting these, and collating the information for some
> > time.
>
> Since there were several versions of the 6502, isn't it likely that
> some had *differing* illegal ops and results?
Not just likely - Absolutely certain.
I've long since forgotten the source, but there used to be a listing of
how the various "illegal" opcodes differed from chip version to chip
version, and only a very tiny handful of them that worked on one version
would work (or even do ANYTHING) on another version.
(which, of course, is why they're "illegal" or "unsupported" or
"undocumented" opcodes - You can't rely on them except on *THAT
PARTICULAR CHIP*, and in some cases, not even then because the "what
comes out the other end" can be influenced by the state of RAM, I/O
locations, or other things that may or may not be duplicatable from one
machine to the next!)
--
Don Bruder - dak...@sonic.net - If your "From:" address isn't on my whitelist,
or the subject of the message doesn't contain the exact text "PopperAndShadow"
somewhere, any message sent to this address will go in the garbage without my
ever knowing it arrived. Sorry... <http://www.sonic.net/~dakidd> for more info
The surprising answer is yes, but not many. ;-)
The earliest documentation I can find is Apple Assembly Line's
"So-Called Unused Opcodes" from 1981 (
http://bobsc5.home.comcast.net/aal/1981/aal8103.html#a2 ).
In 1995 I unknowingly duplicated this work (in less detail) and my
results on a Rockwell 6502 were almost identical to Bob
Sander-Cederlof's Synertek 6502:
http://www.apple2.org.za/mirrors/ground.icaen.uiowa.edu/Mirrors/uni-kl/hardware/undocumented_6502_opcodes
The Apple-based research doesn't differ much from the more detailed
research done on the C64, Atari and even BBC systems. It seems that
this is because the same MOS 6502 mask was used for the 6510 and other
6502 implementations.
In any case, some Apple games use these opcodes, and many more C64
games do. Although a multibyte NOP is often sufficient, in other cases
the expected operation is required, not hard to implement, and improves
emulation authenticity (e.g. crash behaviour).
Cheers,
Nick.
As the 6510 is a MOS chip this doesn't come as a shock, really ;o)
And it explains why "illegal opcodes" are so much more popular on the
C64: Only one chip version in all these years (though under different
names).
AFAIK the 6502's in the Ataris were compatible to each other, too.
> In any case, some Apple games use these opcodes, and many more C64
> games do. Although a multibyte NOP is often sufficient, in other cases
> the expected operation is required, not hard to implement, and improves
> emulation authenticity (e.g. crash behaviour).
I agree with you here, Nick. Documenting them as thorough as possible
is definitely important, as they were used.
On the other hand I haven't seen an application where they were
indispensable.
Does anybody have some interesting sources they would share (or can me
point to some)?
bye
Marcus
The hard thing is collating the documentation!
As for indispensable, I've only seen this one:
http://groups.google.com/group/comp.sys.apple2/msg/cfc5ac0127e3cfb4
Cheers,
Nick.
Actually, that was my point.
It's a tribute to the naivete of early game programmers that they
thought it would be OK to use undefined instructions. It's a good
way of making sure that a game will become obsolete as models advance.
And anyone who couldn't discipline themselves to "get the job done"
on time and under budget by using only documented opcodes isn't much
of a programmer. The whole thing reeks of high school... ;-)
> what is "the high byte of the target address of the operand"
> in the indirect-y case (93)?
I'd suppose it means what it says. If $FB=33, $FC=$71, .Y=0, the
result probably is the same as in the absolute case, only it takes
one more cycle to perform: AXA ($FB),Y
--
Anders Carlsson
Cheers,
Nick.
There are infinitely many ways to make a program obscure without
making it hardware accident-dependent. ;-)
Ms. Pac-Man on the Apple ][:
The crack group used code involving, iirc, SLO, to verify that the
signature was intact. Oddly, it passes on a 65C02, but on an emulated
6502 without illops it crashes into the monitor.
-uso.
I'm still not quite happy with some ugly details that I haven't seen
documented too well:
1.) Operation of ARR ($6B) when the decimal flag is SET.
2.) Exact operation of decimal mode ADC and SBC in both the 6502 and the
65C02.
I'll take a look at the VICE code first. Do you know of any other GOOD
code for this?
Best wishes,
--
Linards Ticmanis
Just stay away from those which are marked as unreliable (e.g. see
http://oxyron.de/html/opcodes02.html)
If we're talking of normal 6502's, then there are almost no differences
on the illegal opcodes. There are ofcourse extended versions of the 6502
like the 65CE02 or the 65C02, but these don't count since also the
documented opcodes have changed...
If we're talking about the illegals and differences, the only thing
which usually is different is the stability of some illegals, but if you
know about these it is safe to use the stable ones or cancel out the
unstability (by ANDing with 0's for example).
hm> On the other hand I haven't seen an application where they were
hm> indispensable.
There are several C64 demo effects that could not have been done
without illegal opcodes. One good example is 6 sprites on top of an
FLI picture, done by Ninja in Darwin 80%:
http://noname.c64.org/csdb/release/?id=12732
It's not just one or two illegals here and there either:
9097 8D 11 D0 STA $D011
909a 4F 18 D0 SRE $D018
909d 8D 11 D0 STA $D011
90a0 4F 02 DD SRE $DD02
90a3 8C 11 D0 STY $D011
90a6 0E 18 D0 ASL $D018
90a9 8F 11 D0 SAX $D011
90ac 8C 02 DD STY $DD02
90af 8D 11 D0 STA $D011
90b2 4F 18 D0 SRE $D018
90b5 8D 11 D0 STA $D011
90b8 6F 02 DD RRA $DD02
90bb 8D 11 D0 STA $D011
90be 0F 18 D0 SLO $D018
90c1 8E 11 D0 STX $D011
90c4 8E 02 DD STX $DD02
90c7 8D 11 D0 STA $D011
90ca 4F 18 D0 SRE $D018
90cd 8D 11 D0 STA $D011
90d0 4F 02 DD SRE $DD02
They were also used in a lot of games - instructions like LAX, SAX are
easy to use and can save a couple of cycles in a critical place.
--
___ . . . . . + . . o
_|___|_ + . + . + . Per Olofsson, arkadspelare
o-o . . . o + Mage...@cling.gu.se
- + + . http://www.cling.gu.se/~cl3polof/
Do you have the original to compare? Since a NOP illop works, it looks
like the functionality is not important. Also Moon Patrol (cracked)
seems to use a #$02 illop, which doesn't make sense since that's
JAM/HLT.
I bought Bug Attack to check the illops in that - when I get the ADT
cable working for my new //c ...
Cheers,
Nick.
> The crack group used code involving, iirc, SLO, to verify that the
> signature was intact. Oddly, it passes on a 65C02, but on an emulated
> 6502 without illops it crashes into the monitor.
Wow, that's the first I've heard of 6502 code using illegal opcodes that
actually works on a 65C02.
> Thanks for all your helpful replies, which are going to find their way
> into an AppleWin improvement soon, if all goes well.
>
> I'm still not quite happy with some ugly details that I haven't seen
> documented too well:
>
> 1.) Operation of ARR ($6B) when the decimal flag is SET.
>
> 2.) Exact operation of decimal mode ADC and SBC in both the 6502 and
> the 65C02.
I have C code which emulates ADC and SBC for the 65C02 for all values of
the argments and flags.
>
> I'll take a look at the VICE code first. Do you know of any other GOOD
> code for this?
>
> Best wishes,
> --
> Linards Ticmanis
--
Scott Hemphill hemp...@alumni.caltech.edu
"This isn't flying. This is falling, with style." -- Buzz Lightyear
Thanks Scott! I've copied the algorithm from VICE for now, but that is
of course NMOS 6502 only, since Commodore never switched to the 65C02
chips for their own computers. Thus I am very interested in your code
for better coverage of 65C02. Would you mind posting it here or mailing
it to me? My From address is valid.
All the Best,
--
Linards Ticmanis
OK, here it is. A is the accumulator, b is the argument (an unsigned 8-bit
quantity). V, D, and C are booleans which represent the state of the
corresponding flags. NZ is a byte which holds the state of the N and Z
flags. The N flag is set if (NZ & 0x80) is true, and the Z flag is set
if (NZ == 0) is true. w is a 16-bit unsigned scratch location.
These instructions were tested by running a PRODOS program which combined
each of the 256 possible accumulator values with the 256 argument values.
The 64K combinations were output as a 128K file containing a one-byte result
and one byte of flags. The program was edited to produce 8 different
versions: (initial C set/clear)x(initial D set/clear)x(ADC/SBC). The
program versions were run on a Laser 128/EX and on an emulator, and the
results compared. (All of this was done about 20 years ago.)
#define ADC() \
do { \
if ((A^b) & 0x80) V = 0; else V = 1; \
if (D) { \
w = (A & 0xf) + (b & 0xf) + C; \
if (w >= 10) w = 0x10 | ((w+6)&0xf); \
w += (A & 0xf0) + (b & 0xf0); \
if (w >= 160) { \
C = 1; \
if (V && w >= 0x180) V = 0; \
w += 0x60; \
} else { \
C = 0; \
if (V && w < 0x80) V = 0; \
} \
} else { \
w = A + b + C; \
if (w >= 0x100) { \
C = 1; \
if (V && w >= 0x180) V = 0; \
} else { \
C = 0; \
if (V && w < 0x80) V = 0; \
} \
} \
A = (byte)w; \
NZ = A; \
} while (0)
#define SBC() \
do { \
if ((A^b) & 0x80) V = 1; else V = 0; \
if (D) { \
int tmp; \
tmp = 0xf + (A & 0xf) - (b & 0xf) + C; \
if (tmp < 0x10) { \
w = 0; \
tmp -= 6; \
} else { \
w = 0x10; \
tmp -= 0x10; \
} \
w += 0xf0 + (A & 0xf0) - (b & 0xf0); \
if (w < 0x100) { \
C = 0; \
if (V && w < 0x80) V = 0; \
w -= 0x60; \
} else { \
C = 1; \
if (V && w >= 0x180) V = 0; \
} \
w += tmp; \
} else { \
w = 0xff + A - b + C; \
if (w < 0x100) { \
C = 0; \
if (V && w < 0x80) V = 0; \
} else { \
C = 1; \
if (V && w >= 0x180) V = 0; \
} \
} \
A = (byte)w; \
NZ = A; \
} while (0)
[code snipped]
Thanks a lot! Luckily the problem is small enough to allow for such
complete verification, but I'm still glad I don't have to do it myself.
The only question that remains: is it true that the 65C02 take one extra
cycle for ADC/SBC when then D flag is set? I'll have a look at the
official data sheet.
--
Linards Ticmanis
> The only question that remains: is it true that the 65C02 take one extra
> cycle for ADC/SBC when then D flag is set? I'll have a look at the
> official data sheet.
Yes, according to information in the data sheets (which I don't have
handy and can't verify).
The issue is that the original 6502 didn't implement all of the status
flags for ADC/SBC in decimal mode: the N flag in particular is not
updated (also the V flag, if I remember right). The Z and C flags are
handled correctly.
On the 65C02 they fixed the N flag for ADC/SBC in decimal mode (and
presumably also the V flag), but in order to do so they had to increase
the execution time of the instruction by one cycle (which only applies
if in decimal mode - in normal/binary mode, execution time is identical
to the 6502).
The 65802/65816 fixed it in a better way, restoring the original
execution time of the 6502.
This little detail (N flag not implemented for ADC/SBC in decimal mode
on the 6502) was used by a routine published in "Programming the 65816
et. al." by David Eyes & Ron Lichty which detected whether your code was
running on a 6502 or something later. A separate test was used to
distinguish between the 65C02 and 65802/65816, relying on the fact that
all undefined opcodes on the 65C02 are guaranteed to be NOPs, unlike the
behaviour of the 6502, so once the decimal ADC had revealed you weren't
on a 6502, it was safe to try using single-byte 65802/65816 instructions
(they would act like NOPs if you have a 65C02).
I recently reposted my extended version of this routine, which also
detects the Rockwell R65C02 and copes with the 65802/65816 running in
native mode with 8-bit registers rather than requiring it to be in
emulation mode (but my routine would also fail if the 65802/65816 was in
native mode with 16-bit registers).
--
David Empson
dem...@actrix.gen.nz
>> The only question that remains: is it true that the 65C02 take one extra
>> cycle for ADC/SBC when then D flag is set? I'll have a look at the
>> official data sheet.
>
> Yes, according to information in the data sheets (which I don't have
> handy and can't verify).
>
> The issue is that the original 6502 didn't implement all of the status
> flags for ADC/SBC in decimal mode: the N flag in particular is not
> updated (also the V flag, if I remember right). The Z and C flags are
> handled correctly.
Accoriding to the algorithm used in VICE, the N and V flags are updated,
but according to a bunch of crazy formulae which are quite useless
(unless you're a hardcore demo coder maybe). So you can't rely on them
staying unchanged either. Nothing's quite as bad as the ARR illegal of
the NMOS 6502 though (which combines fragments of ADC with fragments of
ASL and some internal bus contentions, as it seems).
--
Linards Ticmanis
> Yes, according to information in the data sheets (which I don't have
> handy and can't verify).
BTW the 65C02 data sheet is available from WDC as I just found out:
> http://www.westerndesigncenter.com/wdc/datasheets/w65c02s.pdf
Though AFAIK the old 65C02 in the Apple lacked the WAI, STP, BBSx and
BBRx instructions, right? Since they're not emulated by the old Applewin
code.
--
Linards Ticmanis
I wouldn't count on them being there, anyway.
-uso.
> David Empson wrote:
>
> > Yes, according to information in the data sheets (which I don't have
> > handy and can't verify).
>
> BTW the 65C02 data sheet is available from WDC as I just found out:
I thought as much. I couldn't be bothered looking, and all my Apple II
books are packed away in the basement now (and unlikely to be accessed
again for a while).
> > http://www.westerndesigncenter.com/wdc/datasheets/w65c02s.pdf
>
> Though AFAIK the old 65C02 in the Apple lacked the WAI, STP, BBSx and
> BBRx instructions, right? Since they're not emulated by the old Applewin
> code.
Not sure about WAI or STP (I seem to recall them being new opcodes on
the 65802/65816). The main ones I recall being added were TSB, TRB, STZ
and BRA plus a few more addressing modes on existing instructions, such
as LDA (etc.) indirect and JSR pre-indexed absolute indirect.
The BBSx, BBRx, SMBx and RMBx instructions are not standard 65C02
opcodes. They were added by Rockwell in their R65C02 and apparently were
also included in some of their earlier R6500 series as well.
A standard 65C02 based on the original specification from WDC does not
have any of these instructions: all of the x3, x7, xB and xF opcodes are
NOPs. These opcodes are used on the 65802/65816 for new addressing modes
and other miscellaneous instructions.
Apple only used the WDC 65C02 or second sources with identical
instruction sets: GTE was most common in my observations; apparently
they also used a Rockwell 65C02 without the extra R65C02 instructions. I
have seen an R65C02 in some IIe clones.
--
David Empson
dem...@actrix.gen.nz
Here is the one for the Rockwell R65C02 for comparison:
http://www.6502.org/documents/datasheets/rockwell/rockwell_r65c00_microprocessors.pdf
bye
Marcus
> Just stay away from those which are marked as unreliable (e.g. see
> http://oxyron.de/html/opcodes02.html)
Which is all of them; the 65C02 and 65816 treat all unused opcodes as
NOP. If you're writing code for the Apple II it's not even worth
considering. Even older Apple II's often have 65C02's fitted nowadays,
either as a replacement or via an accelerator card.
Matt
No? Only two are completely unstable and 6 are "unstable in certain
aspects" but you can use them if you avoid the unstable conditions.
> the 65C02 and 65816 treat all unused opcodes as
> NOP. If you're writing code for the Apple II it's not even worth
> considering.
Some illegals are quite useful, like SHX/SHY which in some cases can be
used as STX absolute,Y / STY absolute,X if you use the right. Also you
can do a CPU test and use different code on different CPUs. The demo
"Oneder/Oxyron" does this. It works on 6502 with illegal opcodes, and on
65816 it uses a slower but illegal-free code.
> Some illegals are quite useful, like SHX/SHY which in some cases can be
> used as STX absolute,Y / STY absolute,X if you use the right. Also you
> can do a CPU test and use different code on different CPUs. The demo
> "Oneder/Oxyron" does this. It works on 6502 with illegal opcodes, and on
> 65816 it uses a slower but illegal-free code.
This is reasonable on the C64 where all machines were fitted with
6502's, and in the case of a 65816 it's accelerated, so a slower code
section is acceptable.
The Apple II series is essentially 3 platforms, and 3 CPU generations,
all backwards compatible.
I'll concede there is a potential fringe case where a piece of code was
so tight that downconverting from 65C02 to 6502 broke it, and it was
fixable using an undocumented instruction. Damned if I can think of an
example though; there's almost always another way to find a cycle or
two.
Matt
And in any case, depending on the peculiarities of a particular chip
implementation is just asking to be locked out of future improvements.
> John Selck wrote:
>
> > Some illegals are quite useful, like SHX/SHY which in some cases can be
> > used as STX absolute,Y / STY absolute,X if you use the right. Also you
> > can do a CPU test and use different code on different CPUs. The demo
> > "Oneder/Oxyron" does this. It works on 6502 with illegal opcodes, and on
> > 65816 it uses a slower but illegal-free code.
>
> This is reasonable on the C64 where all machines were fitted with
> 6502's, and in the case of a 65816 it's accelerated, so a slower code
> section is acceptable.
>
> The Apple II series is essentially 3 platforms, and 3 CPU generations,
> all backwards compatible.
And of course you've neglected to mention the other major 6502 computer
series of the era, Atari. They never used anything but a standard 6502
core (but with an extra pin added to make DMA easier), so the 65C02 and
65816 were never an issue. In fact, Atari's 6502C with the extra signal
meant that you _couldn't_ just drop a 65C02 into the same socket.
This was even less of an issue on the 2600, which always used a 6507,
which is a standard 6502 die in a reduced pin count package. Quite a
few games used illegal opcodes to reduce cycle count so that they could
do things that were otherwise impossible with its clock-locked 1-D
graphics chip.
Oddly enough, the Apple II is where I heard most about people using
illegal opcodes and side effects like JSR ($xxFF) back in the day, as a
way to obfuscate copy-protection code. Like I cared, because I had a
TRS-80 (then a CoCo, and a 128K Mac), and most of the Z-80 undocumented
opcodes were obvious, even when they weren't useful. And none of the
6809 undocumented opcodes were useful at all, except there was one that
would do an SWI (equivalent to 6502 BRK) through the FIRQ vector and
another that would jump through the RESET vector.
That was DMA for video RAM acces, right? If so, they just didn't get
how trivial it is to share RAM with a 6502--it only accesses RAM during
half the clock signal!
> This was even less of an issue on the 2600, which always used a 6507,
> which is a standard 6502 die in a reduced pin count package. Quite a
> few games used illegal opcodes to reduce cycle count so that they could
> do things that were otherwise impossible with its clock-locked 1-D
> graphics chip.
It's kind of a self-fulfilling prophecy. If enough people write
applications that depend on undocumented behavior, then that behavior
becomes a "feature" that cannot be changed in future versions--usually
preventing useful improvements.
(Not that I think of the 65C02 as much of an improvement. All the
changes made in the 65C02 were what is known technically in the
computer architecture community as "mouse nuts". ;-)
> Oddly enough, the Apple II is where I heard most about people using
> illegal opcodes and side effects like JSR ($xxFF) back in the day, as a
> way to obfuscate copy-protection code. Like I cared, because I had a
> TRS-80 (then a CoCo, and a 128K Mac), and most of the Z-80 undocumented
> opcodes were obvious, even when they weren't useful. And none of the
> 6809 undocumented opcodes were useful at all, except there was one that
> would do an SWI (equivalent to 6502 BRK) through the FIRQ vector and
> another that would jump through the RESET vector.
There were a few cases of using undocumented behavior to obfuscate, but
it was never much more than a curiosity on the Apple II platform.
IMHO Atari didn't even seriously consider replacing the 6502. The 8-bit
platform was virtually unchanged for ten years. The machines were cash
cows and the Tramiel-aera Atari concentrated in making them cheaper and
cheaper (to compete with Commodore and reach the Eastern Europe market)
- so there was small room to get innovative.
The case with the 2600 is even clearer IMHO: It used the cheaper 6507
and the video game market crash in 1983 did the rest.
I really don't think that Atari even had the telephone number of WDC...
> (Not that I think of the 65C02 as much of an improvement. All the
> changes made in the 65C02 were what is known technically in the
> computer architecture community as "mouse nuts". ;-)
Ha ha - now that's a technical term I can learn quickly!
bye
Marcus
> That was DMA for video RAM acces, right? If so, they just didn't get
> how trivial it is to share RAM with a 6502--it only accesses RAM during
> half the clock signal!
As I understand it, they actually halted the CPU for DMA so as to get
more memory bandwidth than "half the clock signal", and they needed to
know when the CPU had actually stopped so that they could start DMA.
This was also done on the Atari 7800.
I've heard that in the original 400/800 they actually used a regluar
6502, but the extra signal allowed them to save all the external
circuitry that figured this out.
Again, I grew up on the TRS-80, so all of this is stuff that I've
learned about the old Atari hardware in the past five years or so.
So it seems the message is, "If you have good reason to believe that
you are programming for an 'end of the line' platform, then do anything
that works--it won't matter anyway".
Of course, now *all* 8-bit platforms are unchanging, but some of them,
in particular the Apple II, went through several implementations of
the processor.
So, even though I don't feel any need to provide for *future* changes,
I still am motivated to cover all the *past* changes, by coding for the
widest range of systems that makes sense--particularly since there is a
negligible "cost" of doing so.
> The case with the 2600 is even clearer IMHO: It used the cheaper 6507
> and the video game market crash in 1983 did the rest.
In the case of a video game machine, I wouldn't expect any compatible
upgrade path, and the entire hardware arrangement was very ideosyncratic
anyway--so there would be no reason not to do "anything that worked".
Wow--they needed more than a megabyte/second of video data? I guess
if you have about the same resolution as an Apple II, but twice the
color depth, then you do... The //c and following Apple II's got
around this by providing another memory bank in parallel with the
main memory bank.
> I've heard that in the original 400/800 they actually used a regluar
> 6502, but the extra signal allowed them to save all the external
> circuitry that figured this out.
Of course, all 6502's had the RDY line that allowed a trivial amount
of glue logic to halt the CPU between instructions. (Of course, you
couldn't get away with this during a Disk ][ I/O on the Apple II, since
it would mess up the instruction timing.)
In my opinion this approach is too fatalistic - after all Commodore and
Atari didn't really offer CPU-upgrades. The C128 is more or less
completely compatible (illegal opcodes supported) and Atari didn't
offer anything faster or based on a more modern chip.
Therefore their programmers (third party or hobbyists) never really had
to worry about doing something "illegal" when using these opcodes.
Apple on the other hand introduced the IIc quite early in comparison.
It's success and the subsequent enhancement kit for the IIe /
modernized IIe effectively killed the usage of these opcodes in the
Apple world. Side note: I would love to hear from Apple why they chose
the 65C02 - because of the much less power using CMOS design or the
additional features...
And then of course came the 68K-platforms and took over the market...
> In the case of a video game machine, I wouldn't expect any compatible
> upgrade path, and the entire hardware arrangement was very ideosyncratic
> anyway--so there would be no reason not to do "anything that worked".
Good points.
What would be interesting to know is when did the 6502 programmers
begin using illegal/undocumented opcodes. Right from the start? I
vaguely remember hearing about them first in the mid-eighties.
bye
Marcus
The Atari and C64 use an official maximum of 8K RAM for the frame
buffer. The C64 has an additional kilobyte for color information which
is accessed in parallel, AFAIK. So they are comparable to an Apple
without double-hires mode.
The Atari has some possibilities to double the memory by page flipping
but this halves the screen refresh rate. Its also possible to change
the video mode in the middle of a scan line - but this results only in
a different "color interpretion" of the memory cells, not actually
"more" memory cells.
The C64 supports some advanced trickery to do some more colors than
initially thought and officially advertised but this is again more for
static displays.
I can't really speak technically for the C64 but in the case of the
Atari the situation is more complicated than in a "standard" Apple: The
video processor (ANTIC) really takes over the system for the time it
halts the 6502.
You see, ANTIC really is an, albeit very very simple, processor: It has
its own instructions, programs (called display lists) and memory. In
fact in can access the complete RAM, ROM and even the custom chip
register area of the system. This means that you can for example
display a "live view" of the zero page and stack with memory cells
changing all the time...
So I guess Atari chose to keep it simple in making this "multiprocessor
system" work by stopping the 6502 for the moment when ANTIC needs to
maintain the display.
The effect was quite drastic on performance so they clocked the 6502
much higher than in the Apple or C64 system (1,78 MHz). In the end all
three systems were comparable in speed when using the same video
resolutions or text mode.
When ANTIC is switched off the computer is indeed much faster - a
feature of the popular fractal generators of the time.
> Of course, all 6502's had the RDY line that allowed a trivial amount
> of glue logic to halt the CPU between instructions. (Of course, you
> couldn't get away with this during a Disk ][ I/O on the Apple II, since
> it would mess up the instruction timing.)
The Atari and C64 could - with careful programming - even use clean
display interrupts (to change colors or the scroll registers on the
fly) while accessing the disk. But that's mainly due to the more
intelligent disk drive designs - and the slow disk accesses...
bye
Marcus
MJM> Wow--they needed more than a megabyte/second of video data?
No, but they need more than one byte per clock cycle, in some
situations. The C64 does the same thing, during badlines and sprite
access. So yes, if you fill the screen with badlines and cover it with
sprites, you would theoretically need about 2 MB/s, but in reality you
only need it in short bursts.
That's what "end of the line" platform means. No upgrades.
The fact that both companies used "custom" 6502 processors for their
machines is no doubt part of the reason that they never moved forward.
> Apple on the other hand introduced the IIc quite early in comparison.
> It's success and the subsequent enhancement kit for the IIe /
> modernized IIe effectively killed the usage of these opcodes in the
> Apple world. Side note: I would love to hear from Apple why they chose
> the 65C02 - because of the much less power using CMOS design or the
> additional features...
I think it was simply a matter of a modern design with multiple sources.
The ASICs in the //c and //e changed the setup and hold times so that
a 2MHz processor was needed to run reliably at 1MHz, and I'm sure there
was no quantity source for 2MHz NMOS processors by that time.
> And then of course came the 68K-platforms and took over the market...
I'm not sure I'd agree with that assessment. But it is certainly
true that 68K-heads took over the Apple management team. ;-)
>>In the case of a video game machine, I wouldn't expect any compatible
>>upgrade path, and the entire hardware arrangement was very ideosyncratic
>>anyway--so there would be no reason not to do "anything that worked".
>
>
> Good points.
>
> What would be interesting to know is when did the 6502 programmers
> begin using illegal/undocumented opcodes. Right from the start? I
> vaguely remember hearing about them first in the mid-eighties.
Right from the start. The first people to play with 6502s were the
hard-core experimenters, and for them, you take it all apart and put
it back together before you even turn it on the first time. ;-)
It's only after you begin to think that code you write may be useful
on future extensions of a platform that portablity and definitions
become important.
Non-portable code is written for two quite different reasons:
1) Because the coder doesn't even think about portability or
doesn't understand it, non-portablility happens.
2) Because the coder understands perfectly, and chooses to
write non-portable code on purpose. (One-time use, static
platform, compelling need,...?)
I remember hearing about undocumented opcodes in the mid 80's. But I don't
think I heard the term "illegal opcodes" was used in the Commodore world back
then. Perhaps what was called "undocumented" in the Commodore world was
called "illegal" in the Apple world?
Then, when the two worlds crossed paths via Usenet the difference between
illegal and undocumented became blurred. I suppose we will get over it when
all the 8-bit processors die and go to silicon heaven. :-)
--
Best regards,
Sam Gillett
Change is inevitable,
except from vending machines!
Never, then.
;o)
bye
Marcus
The difference between the two terms is dependent on point of view.
The chip/system designer tends to regard undocumented or undefined
behavior as the territory reserved for future expansion. Of course,
if the space is not protected, either by trapping or by behaving
as a NOP, then it is "reserved" only by convention, not by silicon.
The on-the-metal programmer sometimes feels that "undefined" is just
an invitation to experiment, and proceeds to at least document, if
not define, the actions that the chip designer did not intend and
has no intention to preserve.
If a chip/system is successful, there comes a time when follow-on
designs are done. If the officially undefined behaviors have become
sufficiently widely used, then they represent a barrier to the orderly
expansion of the design--such as new instructions requiring previously
undefined opcodes that may have been widely, but unwisely, used.
The new designer quickly comes to resent any such barriers, and would
therefore like to think of them as prohibited, or "illegal", but what
is really missing is discipline--on the part of the original designer,
who failed to "protect" the undefined space (to save a few transistors),
and on the part of coders who couldn't resist the temptation to use
whatever they found--even without any guarantee of future support
(to save a cycle or two).
Ironically, the combination of these two lapses in discipline can result
in there being no acceptable space for the expansion of a design, and
therefore they mortgage the future for a pittance today.
(If it isn't obvious, I've played both the designer's part and, earlier,
the coder's part. ;-)
Not really. At the time every gate on the chip was expensive and saving
them was a valid concern. When I studied VLSI design we called these
undefined cased "Don't care" conditions; invalid input meant we didn't
care about the output. You can see this on some cheap LED digital
clocks where pressing wierd button combinations can cause odd displays;
invalid input. That can actually cut down the number of gates by 20%
or more and so reduce the cost, accordingly.
Handling these "don't care" conditions actually requires a level of
optimisation which doesn't exist in a naive design. So it's not a case
of the designers failing to protect the undefined space, but of them
optimising the circuitry for efficiency.
It is a valid design decision.
With the 6502 essentially not having a microcode level, this
optimisation is exposed to the programmer. This also shows up in the
regularity of the opcodes, and allowed hackers to make educated guesses
as to what the undocumented codes could do.
--
Stephen Harris
use...@spuddy.org
The truth is the truth, and opinion just opinion. But what is what?
My employer pays to ignore my opinions; you get to do it for free.
True, but the decoding in the 6502 is handled by a kind of PLA, so
it would likely not be very expensive (in real estate) to trap or
NOP the invalid combinations.
But your point is well taken, it's a tradeoff of the present vs.
the future of your chip.
> Handling these "don't care" conditions actually requires a level of
> optimisation which doesn't exist in a naive design. So it's not a case
> of the designers failing to protect the undefined space, but of them
> optimising the circuitry for efficiency.
>
> It is a valid design decision.
Yes, but not without implications for the future--see above.
The 6502 was already a much less expensive processor than its
competitors, because of a mask retouching technique MOS Technology
developed that saved mask iterations.
Admittedly, though, the microprocessor design culture at that time
was not oriented toward "protecting" undefined space, as they all
have been in the years since.
When you have just done it, you aren't weighting how you will proceed
over the next 15 years very highly. ;-)
> With the 6502 essentially not having a microcode level, this
> optimisation is exposed to the programmer. This also shows up in the
> regularity of the opcodes, and allowed hackers to make educated guesses
> as to what the undocumented codes could do.
Right--with all the negative implications for the future that I
discussed.
I was discussing the nature of the "undefined" problem in general,
and in light of current practice. If I had been designing the 6502,
I likely would have made the same choice. (But as a coder, I would
have regarded undefined ops as a curiosity, not as an opportunity
to save a few cycles at the expense of future utility.)
You are too today-centric.
Back in the 1970s even that "likely not very expensive" decision raised
the costs significantly and introduced complexity to the chip which was
unnecessary from a technical and marketing point of view.
|> If I had been designing the 6502,
|> I likely would have made the same choice.
Probably everyone would have. Leaving out the illegal opcode traps or
mapping them to NOP did no harm, implementing them would just raise
costs.
|> (But as a coder, I would have regarded undefined ops as a curiosity,
|> not as an opportunity to save a few cycles at the expense of future
|> utility.)
If a platform stays identical over a long-enough period of time
then why not starting to squeeze the last out of it by using undocumented
behavior. After all, the use of such undocumented behavior in the end
led to enhanced capabilities, even the chip designers didn't envision
in the first place.
I'm especially thinking about all the fancy stuff coders did with the
C64's video chip, which like the original 6502 (and the hereon based 6510)
is a hardwired design.
And back then noone really thought about future utility. Maybe apart from
the Apple II those machines were pretty much integrated & rather unexpandable
boxes -- even more, in the early homecomputer and video game market there was
no sense for a "family concept" where software from the old machine would just
run on its next generation successor, cause that most likely was an entirely
new box.
Think of Atari 400/800 vs. 600/800XL, Commodore VIC20 vs. C64, Sinclair ZX81
vs. Spectrum just to name a few.
Rainer
> Stephen Harris wrote:
> > Not really. At the time every gate on the chip was expensive and saving
> > them was a valid concern. When I studied VLSI design we called these
> > undefined cased "Don't care" conditions; invalid input meant we didn't
> > care about the output. You can see this on some cheap LED digital
> > clocks where pressing wierd button combinations can cause odd displays;
> > invalid input. That can actually cut down the number of gates by 20%
> > or more and so reduce the cost, accordingly.
> True, but the decoding in the 6502 is handled by a kind of PLA, so
> it would likely not be very expensive (in real estate) to trap or
> NOP the invalid combinations.
Remember, the 6502 was designed in 1975. Every single gate was expensive.
Optimisation of the "don't care" conditions was an important skill that
had a very real impact on the production cost. Cost was important on this
chip, being a lot cheaper than the competition.
Note that when WDC designed the CMOS version of the chip (65C02) they _did_
NOP the undefined instructions but that was a later design using newer
technologies (CMOS vs NMOS).
> I likely would have made the same choice. (But as a coder, I would
> have regarded undefined ops as a curiosity, not as an opportunity
> to save a few cycles at the expense of future utility.)
The home computer I'm most used to that used the R6502 (the Rockwell
version) at 2Mhz was the Acorn BBC Micro. About the only programs that
used the undocumented codes were games that used them for disassembly
protection; most disassemblers at the time didn't recognise them so
would display a single ??? and then try to decode the next byte as an
opcode. Since the R6502 really treated it as a 2 or 3 byte NOP this
caused bad dissassembly. Later disassemblers were updated to handle
this :-)
It caused some minor issues, though, when the BBC Master came out, which
used the 65C02. Oops! Magazines at the time would publish "hacks" to
fix the few popular rogue programs :-)
The Ataris are not a good example - they are mostly compatible
(hardware and software). Think of their 8-bit machines simply as
"family members" - not different hardware generations like the other
machines.
bye
Marcus
> True, but the decoding in the 6502 is handled by a kind of PLA, so
> it would likely not be very expensive (in real estate) to trap or
> NOP the invalid combinations.
Nope, it's handled by good old random logic. A PLA is designed to be a
programmable generic replacement for random logic, and is way too
inefficient for high-volume VLSI.
My mistake--from this chip photo, I incorrectly identified either
the registers or the ALU as a PLA:
http://micro.magnet.fsu.edu/chipshots/mos/6502large.html
Still, given the relatively ordered distribution of 6502 undefined
ops, I think their detection would have been relatively simple.
Of course, this is moot for at least two reasons: it wasn't done,
and most designers of the time wouldn't have done it anyway.
I suspect that we don't disagree very much about this. But note that
the 6502 was not such a platform. Being relatively successful, it went
on through multiple generations of implementation, and unintended
behaviors are not, in general, maintained in subsequent implementations.
Although early experimenters could not have predicted the course of the
architecture's evolution, it was still common culture, since the 1950s,
not to "exploit" accidental, and therefore unsupported, "instructions"
that might exist in particular computer implementations. The problems
in maintenance and upgrading that this caused were well known.
Commercial computing recognized the significant advantages of creating
an object code-compatible line of machines, with scalable performance,
by the end of the 1950s, and major computer lines were designed with
this in mind by the early 1960s.
It has often been noted that the microprocessor community apparently
needed to rediscover all the lessons already learned by the mainframe
computer culture, but two decades later. A general presumption that
"things will always be as they are now" is one misconception that it
took people a while to correct.
> I'm especially thinking about all the fancy stuff coders did with the
> C64's video chip, which like the original 6502 (and the hereon based 6510)
> is a hardwired design.
As it turned out, the presumption that these were immutable chips was
correct.
As I stated earlier, if you have good reason to believe that you are
programming for an end-of-the-line system, then you are free to do
anything that works. But I consider this a pessimistic assumption
unless the platform has really fossilized--as the platforms we are
celebrating here have.
> And back then noone really thought about future utility. Maybe apart from
> the Apple II those machines were pretty much integrated & rather unexpandable
> boxes -- even more, in the early homecomputer and video game market there was
> no sense for a "family concept" where software from the old machine would just
> run on its next generation successor, cause that most likely was an entirely
> new box.
>
> Think of Atari 400/800 vs. 600/800XL, Commodore VIC20 vs. C64, Sinclair ZX81
> vs. Spectrum just to name a few.
No one is praiseworthy for not thinking about future utility!
It seems hard to imagine now that--almost 20 years after commercial
computers had all moved to scalable, compatible lines to leverage
code investments--the idea that this might be at least as valuable
in the microcomputer marketplace did not influence design decisions.
Until the Mac, all of Apple's computers were designed with application
compatibility with previous machines in mind. Of course, the same has
been true since in the Mac line, and in the entire PC line (after a few
early not-quite-clone dead ends).
I suppose I must fault the undisciplined early coders for participating
in the newest, most radical advance in computing without a real vision
of what success would mean. The lesson was already clear for anyone
who was paying attention. (Note that I have no problem whatever with
someone *using* undocumented features themselves--the problem is when
code that uses such features is released for wider use.)
> My mistake--from this chip photo, I incorrectly identified either
> the registers or the ALU as a PLA:
>
> http://micro.magnet.fsu.edu/chipshots/mos/6502large.html
See that mess of spaghetti in the middle? I think that's the
instruction decoder.
It would actually be fun to see the real logic diagram of the
6502. ;-)
All of the "internals" documentation I've found is very abstract
with no detail where it would be most interesting.
> The fact that both companies used "custom" 6502 processors for their
> machines is no doubt part of the reason that they never moved forward.
Huh? They did move forward. 6510 -> 8500 -> 8502. And also, I don't
think it's any kind of problem for Commodore to use customized CPUs
since they owned MOS, the owners and producers of 6502 tech back then :)
> Non-portable code is written for two quite different reasons:
>
> 1) Because the coder doesn't even think about portability or
> doesn't understand it, non-portablility happens.
> 2) Because the coder understands perfectly, and chooses to
> write non-portable code on purpose. (One-time use, static
> platform, compelling need,...?)
3) Slow-as-hell 8 Bit platforms don't have a proper abstraction layer
for any of their hardware, no matter if sound, graphics, timers, ports
or CPU.
And what exactly were the differences between these chips?
> And also, I don't think it's any kind of problem for Commodore to use
> customized CPUs since they owned MOS, the owners and producers
> of 6502 tech back then :)
And, yes, MOS made all the funky custom chips - but the 6502-core chips
were AFAIK the most advanced CPUs designed/modified/produced by them.
bye
Marcus
And these new processors preserved the undefined opcode behavior?
If so, they are almost certainly *not* new processor designs.
If not, then they are a good illustration of why it's a bad idea
to depend on undefined behavior.
>> Non-portable code is written for two quite different reasons:
>>
>> 1) Because the coder doesn't even think about portability or
>> doesn't understand it, non-portablility happens.
>> 2) Because the coder understands perfectly, and chooses to
>> write non-portable code on purpose. (One-time use, static
>> platform, compelling need,...?)
>
>
> 3) Slow-as-hell 8 Bit platforms don't have a proper abstraction layer
> for any of their hardware, no matter if sound, graphics, timers, ports
> or CPU.
Nonsense. The "abstraction" we are discussing is the published
documentation for the processor/system. What it documents is the
abstraction that future implementations will preserve. What it
doesn't document will generally not be preserved.
The only exception to this is when rampant exploitation of undocumented
features makes their preservation a marketing necessity. In the latter
case, many extensions to the system functionality will be hampered by
the need to provide the undocumented functionality, making evolution
less attractive.
Most early micro systems suffered from this problem. Many didn't live
long enough or attract enough user base to face the issue. But those
who did face it had to decide what undocumented behaviors of the early
platform to leave behind, along with any software that exploited them.
I'd love to see it.
Rich
Same opcodes but faster clockspeed. Also, CBM switched from NMOS to HMOS.
>> 3) Slow-as-hell 8 Bit platforms don't have a proper abstraction layer
>> for any of their hardware, no matter if sound, graphics, timers, ports
>> or CPU.
>
> Nonsense. The "abstraction" we are discussing is the published
> documentation for the processor/system. What it documents is the
> abstraction that future implementations will preserve. What it
> doesn't document will generally not be preserved.
What about the bugs in the decimal mode? They were fixed on later
processor designs, this also renders the CPUs incompatible for some
programs.
The 6502 is a design to have as few transistors as possible. The whole
CPU only has a few thousand of them. Handling the illegals would have
increased this small amount a lot.
> Admittedly, though, the microprocessor design culture at that time
> was not oriented toward "protecting" undefined space, as they all
> have been in the years since.
It's not about culture, it's about being able to do more with just a few
transistors. What CPU would you buy? The one which can do less has NOPs
instead of illegals, or the one which can do more?
Removing illegals is ok if you only waste 1000 transistors of 1000000,
but if it's 1000 transistors of 6000, then it's a whole different matter.
> It would actually be fun to see the real logic diagram of the
> 6502. ;-)
>
> All of the "internals" documentation I've found is very abstract
> with no detail where it would be most interesting.
Some hungarian maniacs have reverse engineered the 6502:
http://impulzus.sch.bme.hu/6502/6502/
Sadly the site is in hungarian language, but atleast the logic diagrams
speak for themselves.
So this was an algorithmic rework of the original design for a new
process, not a new logical design.
>>> 3) Slow-as-hell 8 Bit platforms don't have a proper abstraction layer
>>> for any of their hardware, no matter if sound, graphics, timers,
>>> ports or CPU.
>>
>>
>> Nonsense. The "abstraction" we are discussing is the published
>> documentation for the processor/system. What it documents is the
>> abstraction that future implementations will preserve. What it
>> doesn't document will generally not be preserved.
>
>
> What about the bugs in the decimal mode? They were fixed on later
> processor designs, this also renders the CPUs incompatible for some
> programs.
Processor bugs (or "errata") are cases where the implementation does not
behave as the documentation says that it should. The decimal bugs in
the original 6502 were not part of its documentation, and therefore
should not be depended upon. (Of course, in the presence of the bug,
code cannot depend on the documented *correct* behavior either; it must
be written to "work around" the bug without depending on undocumented
behavior.)
Errata are always a special case. The designers always hope that code
that runs correctly on the original, buggy processor will still run
correctly on a fixed processor. In other words, they hope that no
one has written code that *depends* on buggy behavior.
To summarize, code written to the specifications of a processor will
work correctly on all implementations, *except* where an implementation
is faulty. Code written to work around a fault can still be written
to the specifications, but avoiding the faulty case(s).
On the other hand, code written to *require* faulty behavior to run does
not conform to the specification, and may not run correctly on a later
implementation of the processor.
Compatibility is always with the specification--not with undocumented
and possibly faulty behavior exhibited by a particular implementation.
Such a specification describes the abstraction known as the processor
"instruction set architecture" (ISA).
What do you know! The decoding *is* done with a PLA after all!
Actually, neither of us has the data on how much would have been
required. I argued that if the decoding is regular--as in PLA--
then the cost might have been relatively modest.
From looking at the Hungarian reverse engineering of the 6502, it
appears that I was right about the PLA decoder. A PLA is used to
derive control signals from the op register.
>> Admittedly, though, the microprocessor design culture at that time
>> was not oriented toward "protecting" undefined space, as they all
>> have been in the years since.
>
>
> It's not about culture, it's about being able to do more with just a few
> transistors. What CPU would you buy? The one which can do less has NOPs
> instead of illegals, or the one which can do more?
The difference would be in die size, and therefore cost.
And just "getting it done" with absolute minimal die size *is* a design
culture. (One that is no longer current in commercial microprocessors.)
Moore's "Law" has provided a great deal of freedom in design cultures.
;-)
> Removing illegals is ok if you only waste 1000 transistors of 1000000,
> but if it's 1000 transistors of 6000, then it's a whole different matter.
I admit that I don't have a quantitative estimate of the number of
transistors (PLA terms) required to protect at least most of the
unused opcode space, but I suspect it would not be more than a few
hundred (out of about 3700 in the 6502).
Since the decoding is done with a regular (rectangular) PLA, the real
issue is not the number of transistors, but the number of rows and
columns required to make a "complete" decoder. This would add slightly
to the chip dimensions. Adding terms to the PLA within the existing
matrix would not add to chip size.
Any quantitative estimate would need to be based on the actual 6502
implementation.
aiia...@gmail.com <aiia...@gmail.com> schrieb:
>>It would actually be fun to see the real logic diagram of the 6502.
>>;-)
>
> I'd love to see it.
According to http://www.ncsu.edu/wcae/WCAE1/hanson.pdf (see also
http://www.ncsu.edu/wcae/WCAE1/), there must exist a blueprint at the
University of Mississippi, Department of Electrical Engineering.
So, perhaps, there might be a source of a possible logic diagram
available, if someone could get in touch with that University?
Regards,
Spiro.
--
Spiro R. Trikaliotis http://cbm4win.sf.net/
http://www.trikaliotis.net/ http://www.viceteam.org/
> Michael J. Mahon wrote:
.............
>> Non-portable code is written for two quite different reasons:
>>
>> 1) Because the coder doesn't even think about portability or
>> doesn't understand it, non-portablility happens.
>> 2) Because the coder understands perfectly, and chooses to
>> write non-portable code on purpose. (One-time use, static
>> platform, compelling need,...?)
>
> 3) Slow-as-hell 8 Bit platforms don't have a proper abstraction layer
> for any of their hardware, no matter if sound, graphics, timers, ports
> or CPU.
???????????????????
Could you point at one single case where use of undocumented opcodes did
significantly speed up an application on the 6502 platform?
--
----------------------------------------------------------------
Paul Schlyter, Grev Turegatan 40, SE-114 38 Stockholm, SWEDEN
e-mail: pausch at stockholm dot bostream dot se
WWW: http://stjarnhimlen.se/
John Selck <sel...@informatik.hgv-hamburg.de> writes:
>Same opcodes but faster clockspeed. Also, CBM switched from NMOS to HMOS.
Strictly speaking, HMOS *is* NMOS because they both still use n-type MOSFETs.
--
Cameron Kaiser * cka...@floodgap.com * posting with a Commodore 128
personal page: http://www.armory.com/%7Espectre/
** Computer Workshops: games, productivity software and more for C64/128! **
** http://www.armory.com/%7Espectre/cwi/ **
Depends on how you define "significantly".
If you understand it as "I want to see it on the stop-watch in my hand",
then of course you won't find any.
There are quite some "it wouldn't work otherwise" examples, though,
which require those undocumented opcodes to meet strict timing.
Rainer
And the excellent detailed block diagram on the last page of
Hanson's paper is a welcome addition to my library!
> Could you point at one single case where use of undocumented opcodes did
> significantly speed up an application on the 6502 platform?
Bresenham interpolation.
Without illegal opcodes:
STA $12
TXA
STA $FE00,Y
TAX
LDA $12
SBC $10
BCS .skip
ADC $11
INX
.skip
With illegal opcodes:
SHX $FE00,Y
SBC $10
BCS .skip
ADC $11
INX
.skip
Best case: 21 vs 11 clock cycles (almost 2x speed with illegal)
Worst case: 25 vs 15 clock cycles (still 1.67x speed)
> I admit that I don't have a quantitative estimate of the number of
> transistors (PLA terms) required to protect at least most of the
> unused opcode space, but I suspect it would not be more than a few
> hundred (out of about 3700 in the 6502).
And you think people would waste a few hundred just for that? Also,
you have propably noticed the lack of space on the cpu... If they
had space for a few hundred more transistors, they would rather have
added another adressing mode or opcodes like PLX/PHX or TXY etc, but
not NOP'ing out the illegals.
Btw, also x86 CPUs have undocumented opcodes aswell... and programs
do use them.
> Errata are always a special case. The designers always hope that code
> that runs correctly on the original, buggy processor will still run
> correctly on a fixed processor. In other words, they hope that no
> one has written code that *depends* on buggy behavior.
>
> To summarize, code written to the specifications of a processor will
> work correctly on all implementations, *except* where an implementation
> is faulty. Code written to work around a fault can still be written
> to the specifications, but avoiding the faulty case(s).
A classic example of this from the real 6502, is the Indirect Absolute
JMP bug. It's trivial, once you know it's there to work around it. But
if you pick a workaround that exploits the bug, your code won't work on
any of the later parts.
Did anyone actually do this? I certainly hope not
Matt
> Best case: 21 vs 11 clock cycles (almost 2x speed with illegal)
> Worst case: 25 vs 15 clock cycles (still 1.67x speed)
Nice... It would be amusing to see a game title on the Apple II+
running faster than on the IIe :-)
Of course, the point remains though that your would never do this on
the Apple II series, as you'd limit your target market to the earliest
machine, rather than just the lowest common denominator of all machines
(64k, 6502 code that's documented and unbuggy)
If I were coding on a different 6502 platform, I might consider using
these.
But then, I'm very much a 65C02 coder these days, as pretty much
everything I write is intended for a IIe/IIc and wouldn't run on an
older Apple II anyway.
Michael will probably shoot me down for saying this, but when I mean
BRA, I write BRA. It's clearer code, and I really wish that anyone who
writes code specifically for the earlier processors would comment their
unconditional branches accordingly.
But then I'm also pretty carefree about not bumping into the old 6502
bugs either ;-)
Matt
This is splitting hairs. Bill Mensch created the 65C816 and 65C02
designs, which are most definitely *new*, as they did not share any of
the original design. However, the '02 was not a new logical design,
just a redo of the existing NMOS 6502 to take advantage of CMOS.
As well, I know there are ways to create new CPU designs that preserve
undocumented behavior of older designs, designs that add extensive
feature sets to the original design (8->16 bit transition, etc.) Even
Intel has done so (the FFFF segment wrap in the 8086) as they created
news designs.
At the end of the day, enough time passed with the NMOS 6502 core in
play in the CBM world that it became the norm. It was (and is) not
always seen as bad programming practice in the context of CBM
development to utilize the illops.
Jim
I don't think a PLA allows X (don't care) states. The large logic
matrix at the bottom of the diagram is a wired-or matrix, which is why
the illops exist.
Jim
Jim
Not so. The logical elements of the NMOS design are not present in
the same form in CMOS. And the instruction decoding was clearly
redesigned, as was the control section (changed timings and bus cycle
patterns, new instructions).
It would be difficult to design new logic for a processor that preserved
the undocumented behavior of an earlier version--and impossible if new
instructions are added.
> As well, I know there are ways to create new CPU designs that preserve
> undocumented behavior of older designs, designs that add extensive
> feature sets to the original design (8->16 bit transition, etc.) Even
> Intel has done so (the FFFF segment wrap in the 8086) as they created
> news designs.
That kind of behavior is of a completely different kind than random
bus clashes as multiple data sources are unintentionally gated onto
a bus!
> At the end of the day, enough time passed with the NMOS 6502 core in
> play in the CBM world that it became the norm. It was (and is) not
> always seen as bad programming practice in the context of CBM
> development to utilize the illops.
Not the norm, I would say, but the *only* 6502 implementation ever
used on that platform. I'd say that it illustrates my original point.
Long after the 65C02 design was available, lower power, faster, and
cheaper to make, the CBM line could not easily make use of it.
In the area of processor design these days, a wired ROM decoder is often
referred to as a PLA, since it is an array and it is "programmable" by
wiring choices at design time. Wire-ORing is common to many logic
configurations.
No shooting from here. ;-)
I agree that "unconditional" branches should always be commented so,
even when they are written as conditional. I have found only a very
small fraction of unconditional jumps that cannot be perfectly safely
with the right conditional jump.
-michael
Parallel computing for 8-bit Apple II's!
I saw that claim on Wikipedia. Is there any published reference?
Stephen Harris wrote:
> With the 6502 essentially not having a microcode level, this
> optimisation is exposed to the programmer.
The PLA is essentially equivalent to microcode. The difference is
that it is addressed by the instruction register and a few state bits,
rather than a micro-PC.
Eric
Bruce Tomlin wrote:
> Nope, it's handled by good old random logic. A PLA is designed to be a
> programmable generic replacement for random logic, and is way too
> inefficient for high-volume VLSI.
Sorry, but Michael is correct. It is definitely a PLA. You can see it
on the die (or a photomicrograph); it's the most regular structure. It
is near one edge of the die, takes up an area about 1/6 of the long
dimension of the chip by most of the short dimension.
If I've counted correctly, the logical size of the product term array is
21 inputs by 137 product terms. 15 of the inputs are from the
instruction register; they are the true and complement forms of each bit
except bit 1. It appears that two inputs come from the clock generator,
and four from a state counter, but I haven't studied it in enough detail
to be certain of that. Most of the product terms control gating of
signals between registers.
MOS PLA design is actually quite space-efficient, and for functions like
instruction decode is usually more space-efficicent than equivalent
random logic would be. This is especially true of the NMOS 6502, for
which the instruction encoding was designed in such a way that they did
not need to OR together many (any?) of the product terms.
It's not a *field* programmable logic array, if that's what you were
thinking of. It's mask programmed by the presence or absence of
transistors at specific locations in the array.
Many microprocessors in the 1970s, and almost all newer ones use PLAs
for instruction decode and other tasks. Even the lowly TMS1000 four-bit
microprocessor, the epitome of cheap microprocessors in the mid-1970s,
used PLAs.
Eric
Certainly it does. That's one of the primary distinguishing characteristics
of a PLA as opposed to a ROM.
Eric
John Selck <sel...@informatik.hgv-hamburg.de> writes:
> Huh? They did move forward. 6510 -> 8500 -> 8502.
That was a progression of nearly insignificant changes. Not comparable
to e.g. the 8086/80286/80386 progression, or even to the
6502/65C02/65816 progression.
> And also, I don't
> think it's any kind of problem for Commodore to use customized CPUs
> since they owned MOS, the owners and producers of 6502 tech back then
Yes, it was a problem for them. Designing (or even just modifying)
a processor costs a lot of money, even if you own the fab. You don't
do it unless you have a business plan with a reasonable expectation
of (more than) recouping those costs.
In a modern process, a production mask set costs about $0.5-$1 million
if you don't own the fab, maybe 20% of that if you do. Back in the
1980s with larger process geometries the costs were lower, but still
not insignificant.
HMOS is just NMOS with smaller process geometry. The NMOS to HMOS
transition was a bigger change than for instance going from 1 micron
to 0.8 micron CMOS, but not nearly as big a change as switching from
PMOS to NMOS, or from NMOS/HMOS to CMOS.
Such as?
> Of course, the point remains though that your would never do this on
> the Apple II series, as you'd limit your target market to the earliest
> machine, rather than just the lowest common denominator of all machines
> (64k, 6502 code that's documented and unbuggy)
It's easy to do a processor check.
There is book about the history of Commodore which is rather detailed
(and entertaining) about the MOS history, too:
http://www.commodorebook.com
The MOS chapter can be read freely here:
http://www.commodorebook.com/view.php?content=toc
Excerpt:
At MOS Technology, John Pavinen pioneered a new way to fabricate
microprocessors. "They were one of the first companies to use
non-contact mask liners," says Peddle. "At that time everybody was
using contact masks."
With non-contact masks, the metal die did not touch the wafer. Once the
engineers worked out all the flaws in the mask, it would last
indefinitely.
---
I guess, Chuck Peddle is a good enough reference ;-)
bye
Marcus
Maniacs indeed! Thanks for the link, John!
Now - does anybody know something equivalent for the Z80?
[ducks]
bye
Marcus
>> Huh? They did move forward. 6510 -> 8500 -> 8502.
>
> And what exactly were the differences between these chips?
8502 allows 2 MHz and afaik also has some extra pins for zeropage
addressing mode and stack detection.
8500 is ofcourse basically a 6510 :)
>> And also, I don't think it's any kind of problem for Commodore to use
>> customized CPUs since they owned MOS, the owners and producers
>> of 6502 tech back then :)
>
> And, yes, MOS made all the funky custom chips - but the 6502-core chips
> were AFAIK the most advanced CPUs designed/modified/produced by them.
The VIC or SID are about 3 times bigger than the 6502. But ofcourse the
6502 is ingenious :) the only bad thing about it is, that they wasted
transistor space for something like BCD mode. I would have preferred
some small extra commands like ADD without carry or TXY.
IIRC the S-JiffyDOS patch uses illegal opcodes to speed up GCR decoding.
(Jochen, are you reading here and can comment?)
And I'm sure quite some demos contain illegal opcodes to either make
effects possible at all or get the max out of the raster timing.
Rainer
That's why I wrote "advanced CPUs" ;-)
> the only bad thing about it is, that they wasted transistor space for
> something like BCD mode. I would have preferred some small extra
> commands like ADD without carry or TXY.
I guess, all of us have some wishes for design changes and I agree with
your propositions (though I would've called it TYX - sounds better).
Additionally I would've dumped the indexed-indirect addressing mode and
simply made an indirect-indexed-mode with the X register. IMHO this
would've been infinitely more useful.
For a long time I wished for a block move instruction but I've already
seen too many cases of non-contiguous blocks. A block fill on the other
hand...
bye
Marcus
> I guess, all of us have some wishes for design changes and I agree with
> your propositions (though I would've called it TYX - sounds better).
You could need both, TXY and TYX. Also PHX/PLX/PHY/PLY would have been
nice and propably would have been possible with only few extra transistors.
> For a long time I wished for a block move instruction but I've already
> seen too many cases of non-contiguous blocks. A block fill on the other
> hand...
And then we would have ended up like the Z80: That a copy loop is faster
than the actual block copy instruction :D
Anyway, you don't need block fill since you can use block copy for filling
too, just write the fill-byte to the start address and then start copying
to bufferstart+1, it will do the same.
Anyway, things could have been better but at the same time they could have
been way worse. We're actually pretty lucky with the existing
implementations
of the 6502 and also VIC + SID.
Can you cite an example? The block move instructions on the Z80 are
actually fairly efficient. They use three memory cycles to move a byte,
vs. the theoretical minimum of two. Doing a block copy via a software
loop is going to require at least five memory cycles per byte moved, and
probably more.
I'll defer. All I know is the DTV designer said she originally did not
include any illops in the design because she could not emulate the
wored-or matrix in the FPGA in a space efficient manner. I assumed the
FPGA is a superset functionality of the PLA, and thus if it can't be
easily dealt with with in there, I didn;t think it could in the PLA.
Jim
I guess we'll agree to disagree on the point. I say all the design
elements are there (the 8 bit ALU, decimal mode, the 16 bit instruction
pointer, etc.) As for the instruction decoding, Bill just cleaned up
the don't cares, as I see it. Hardly a huge new design. The physical
elements changed, yes, but the logical perspective stayed the same.
When I interviewed him back in 1995, he said as much. He simply wanted
to clean up the design and lay it out so he could take advantage of not
only the CMOS process but the ability to shrink the feature size every
so many years. He hasn;t changed his design since the original. He
just shrinks the design using the newer process and re-fabs. That's how
he stated he got his speed increases.
> That kind of behavior is of a completely different kind than random
> bus clashes as multiple data sources are unintentionally gated onto
> a bus!
I'll agree they are different, but the point was that they are both
undocumented behavior. Intel found the undocumented behavior was used
all over, so they had to build in support for it. Therefore, every new
Intel CPU supports this "undocumented behavior". Of course, it is now
documented, so it becomes legitimate by virtue of so many people
exploiting it that it became the std.
> Not the norm, I would say, but the *only* 6502 implementation ever
> used on that platform. I'd say that it illustrates my original point.
It probably does not qualify, but the C65 used a different
implementation of the 02, and the CLCD used a CMOS '02 as well.
> Long after the 65C02 design was available, lower power, faster, and
> cheaper to make, the CBM line could not easily make use of it.
If it had made economic sense, CBM would have tasked Mensch to add the
errata into the C02. Apple got Bill to change the 816 timings, and CBM
had more clout with Bill, since they originally helped set up WDC.
But, I'll concede the point that it would have required more cash and
time than the Apple II line needed.
I guess, in priciple, I agree with you that undocs fly against the rule
of programming. HOwever, in the CBM environment, the rules are a bit
different. As well, regardless of how one views the use of undocs, I'm
not willing to be harsh on the MOS folks for what they did. They made a
$20 CPU and got Woz and the Bushnell interested in using CPUs, which
brought all of us to where we are today. I won't let a few space and
time saving details that made perfect since in the early 1970's cloud that.
Jim
I wrote:
> Certainly it does. That's one of the primary distinguishing
> characteristics
> of a PLA as opposed to a ROM.
Jim Brain <br...@jbrain.com> writes:
> I'll defer. All I know is the DTV designer said she originally did
> not include any illops in the design because she could not emulate the
> wored-or matrix in the FPGA in a space efficient manner.
That's true.
> I assumed the FPGA is a superset functionality of the PLA, and thus if it
> can't be easily dealt with with in there, I didn;t think it could in the
> PLA.
There's a lot of difference between an FPGA being conceptually a superset
of a PLA, and an FPGA being able to efficiently implement any particular
PLA. I've been trying to cram the logic of the DEC J11 microprocessor
into an FPGA, and the decode PLA in that, which is relatively small in
the actual J11 chip, takes up an enormous amount of FPGA space.
The problem is that an FPGA cell is much more coarse-grained than a
PLA "cell", which is just a transistor.
Eric
Jim Brain <br...@jbrain.com> writes:
> I guess we'll agree to disagree on the point. I say all the design
> elements are there (the 8 bit ALU, decimal mode, the 16 bit
> instruction pointer, etc.) As for the instruction decoding, Bill just
> cleaned up the don't cares, as I see it.
You're talking about two completely different design levels. Michael
is talking about transistors, gates, PLAs, and flip-flops. You (Jim)
are talking about the programmer-visible processor architecture.
It's possible to produce two processors with vastly different
implementations that have the same architecture, e.g., IBM System 360/30
versus System 360/67, which shared almost no commonality of hardware
design, but could run the same software.
It's also possible to produce two processors with nearly the same
hardware implementation, but dramatically different architecture.
The 65C02 architecture is almost a superset of the 6502. The implementation
is *much* different.
> Hardly a huge new design.
Tell us that after you design a 6502 replacement in a new technology.
> The physical elements changed, yes, but the logical perspective stayed
> the same. When I interviewed him back in 1995, he said as much.
The "physical elements changed" is basically stating that the entire
chip was redesigned.
We could build a very slow 6502-compatible processor out of a lot of
tinkertoys, and still have the same "logical perspective". I don't think
you'd try to claim that it wasn't a huge new design.