I finally discovered what the problem was. Luckily, Street Fighter 2
has a basic monitor program built into it, so when it crashes (address
error, line F emulation etc), it actually gives you a register/RAM
dump. I noticed that the crashes were occurring on one of a small
number of addresses, the most common being 0x138DE, which turns out to
be pretty innocuous:
chris@wotan$ dis68 ../sf2_3.bin 0x0138D2 10
0x0138D2 movea.l 40(fp), a0
0x0138D6 move.w (a0)+, d7
0x0138D8 move.w (a0)+, d0
0x0138DA move.w (a0)+, d1
0x0138DC move.b (a0)+, d2
0x0138DE move.b (a0)+, d3 <------------
0x0138E0 bsr.w 0x13A0A
0x0138E4 bsr.w 0x13A88
0x0138E8 dbf d7, 0x138D8
0x0138EC rts
chris@wotan$
But since the crashes nearly always occurred there, I thought I'd
sample the output of a comparator on the address lines and the
constant 0x009C6F (the equivalent word address to the 0x138DE byte
address). The result is interesting. Looking at the traces, I can see
that this code executes very regularly during the game. Ordinarily
things are fine, but on very rare occasions, something weird happens:
http://www.swaton.ukfsn.org/umdkv2/glitches.html
Notice that ordinarily the address lines are stable well before /C_OE
asserts, and remain stable until well after /C_OE deasserts. But very
rarely, there are three glitches in the middle of the cycle. I don't
know in detail what these glitches are, but I suspect they are some
sort of crosstalk or ground-bounce problem which causes one or more
address lines to flicker slightly.
In order to test this hypothesis, I added some debounce code to the
VHDL which effectively just throws away any single-cycle glitches on
the address lines. Before putting this fix in, the game would
typically run for less than a minute before crashing. After putting
the fix in, it ran for over an hour before crashing. Admittedly it did
eventually crash, but if there are signal integrity problems the
debounce logic is probably insufficient on its own. I reckon that by
registering all the inputs at the *start* of the /C_OE assert
(necessary for SDRAM anyway) and with more paranoid debouncing, I
should be able to get it running indefinitely, even with the poor
signal integrity on the prototype board.
So, in all I'm pretty confident that with a decent PCB layout with a
nice solid ground plane, we will almost certainly be able to avoid
such problems.
Chris