Afew years back, Doom SNES's source code was being released by the lead programmer, @RandalLinden, however not all of it was able to be released due to some legal issues with some of the assets, making the source code uncompilable. (At least without major efforts, I don't know if anyone did successfully)
However, recently a complete backup archive of the source code (INCLUDING graphics data, level data, XBAND netcode and everything that Randy wasn't able to release himself) was tracked down by JJ Dasher, who obtained it from Jeff Hughes (who had kept it ever since working for Sculptured Software) and finally managed to dump it all and release it on
archive.org.
It seems a few days older than the stuff Randy released, so unfortunately there won't be the edited levels that had been rejected, but rejoice! DOOM SNES's source code is finally out of legal limbo! I can't wait to see what people do with it, and what discoveries are made in the code...(So, post them here)
Update, June 28, 2021: Ars was saddened to learn that the author of this piece, who also used the handle Near in online interactions, reportedly took their own life over the weekend. We're republishing this piece today in memory of their towering contributions to the classic gaming emulation community and to Ars. You can also read their 2011 piece on the quest for accuracy in bsnes development.
If you or someone you know is struggling with suicidal thoughts, please contact the National Suicide Prevention Lifeline at
800-273-8255, or reach out to a similar international hotline. We'd also encourage readers to donate to the American Federation for Suicide Prevention if they're so moved.
Today, SNES emulation is in a very good place. Barring unusual peripherals that are resistant to emulation (such as a light-sensor based golf club, an exercise bike, or a dial-up modem used to place real-money bets on live horse races in Japan), every officially licensed SNES title is fully playable, and no game is known to have any glaring issues.
SNES emulation has gotten so precise that I've even taken to splitting my emulator into two versions: higan, which focuses on absolute accuracy and hardware documentation; and bsnes, which focuses on performance, features, and ease of use.
Today, we enjoy cycle-level accuracy for nearly every component of the SNES. The sole exception is the PPUs (picture processing units), which are used to generate the video frames sent to your screen. We mostly know how the PPUs work, but we have to make guesses for some functionality that result in less than total perfection.
The remaining issues are relatively small ones, in the grand scheme of things. If you're not interested in the pursuit of one hundred percent faithful emulation perfection for its own sake, I am not going to be able to convince you of the need for improving SNES PPU emulation further. As with any goal in life, the closer we get to perfection, the smaller the returns.
I can tell you why this is important to me: it's my life's work, and I don't want to have to say I came this close to finishing without getting the last piece of it right. I'm getting older, and I won't be around forever. I want this final piece solved so that I can feel confident in my retirement that the SNES has been faithfully and completely preserved through emulation. No stone was left unturned, no area left unfinished. I want to say that it's done.
Imagine you are emulating a CPU's "multiply" instruction, which takes two registers (variables), multiplies them together, and produces a result and some flags that represent the status of the result (such as overflow).
We could devise a software program that multiplies every possible value from 0 to 255 as both the multiplier and multiplicand. Then we could output both the numeric and flag results of the multiplication. This would produce two 65,536-entry tables.
By analyzing these tables, we could determine exactly how and when the CPU results were set certain ways. Then we could modify our emulators so, when running the same test, we produce exactly the same tables at the same times.
In cases like this, we would have to get more selective with our tests and try to determine exactly when flags might change, when results might overflow, and so forth. Otherwise we'd have tests that would never complete.
Multiplication is a fairly trivial operation, but this is the general process behind reverse engineering, and it extends to more complex operations such as how the SNES' horizontal blanking DMA (direct memory access) transfers work. We create tests that try to detect what happens on edge cases, then confirm that our emulation behaves identically to a real SNES.
If you imagine a 100Hz clock, it is a device with a digital pin that transitions to logic high (+5 volts, for instance), and then back to logic low (0 volts, or ground) 100 times per second. So every second, the pin voltage will fluctuate 200 times total: 100 rising clock edges and 100 falling clock edges.
A clock cycle is generally treated as one full transition, so a 100Hz clock would generate 100 clock cycles per second. There are some systems that require distinguishing between rising and falling edges, and for those, we break this further down into half-cycles to denote each phase (high or low) of the clock signal.
The key goal of an authentic emulator is to perform tasks in exactly the same ways and at exactly the same times as the real hardware. It doesn't much matter specifically how the tasks are performed. All that matters is that the emulator, when given the same inputs, generates the same outputs with the same timing as real hardware.
Sometimes, operations happen over time. Take SNES CPU multiplication, for instance. Rather than pausing to wait for multiplication to complete, the SNES CPU calculates the multiplication result one bit at a time in the background over eight CPU opcode cycles. This allows your code to possibly do other things while waiting on the multiplication to complete.
Any commercially released software is likely to wait those eight cycles, because if you try to read the result before it's ready, you will get a partially computed result instead. Yet earlier SNES emulators gave correct results immediately, without waiting these extra cycles.
When hobbyists started creating and testing homebrew software via emulators, this discrepancy started to cause some problems. Some of this software, such as many early Super Mario World ROM hacks, only worked correctly on these earlier emulators, and not on real SNES hardware. That's because they were designed with the emulator's immediate (and inauthentic-to-real-hardware) multiplication results in mind.
As emulators improved, this old software broke, and we have had to subsequently offer compatibility options in our newer emulators in order to not lose this software to time. Yes, as surreal as it is to say, these days our emulators have to emulate other emulators! How meta!
The nice thing about the CPU multiplication delay is that it's very predictable: the eight computation cycles start immediately after requesting a multiplication. By writing code to read the results after every cycle, we were able to confirm that the SNES CPU was using the Booth algorithm for multiplication.
During the rendering of every scanline, at a certain point, the entire SNES CPU freezes for a short duration as the contents of the RAM chip are refreshed. This is needed because, as a cost-cutting measure, the SNES used dynamic RAM (rather than static RAM) for its main CPU memory. Dynamic RAM must be periodically refreshed in order to preserve its contents over time.
By reading the counters multiple times, I was able to determine which quarter of a clock cycle the counter was aligned with. By combining that insight with a specially crafted function that could step by a precise, user-specified number of clock cycles, it became possible to perfectly align the SNES CPU to any exact clock cycle position I wanted.
By iterating over a range of clock cycles in a loop, I could determine exactly when certain operations (such as DRAM refresh, HDMA transfers, interrupt polling, etc.) would occur, and I was able to reproduce this precisely under emulation.
The SNES SMP chip has its own timers as well, and similar reverse engineering was successful against that processor as well. I could spend an entire article talking about the SMP TEST register alone, which allows coders to control the clock divider of the SMP and its timers, among other horrible things. Suffice it to say that, while it was not an easy or fast process, we were ultimately victorious.
Enlarge / The SuperFX chip is just one of many cartridge coprocessors that an SNES emulator has to handle correctly.There were a whole host of SNES coprocessors used inside various game cartridges that needed to be tamed as well. From dedicated general-purpose CPUs like the SuperFX and SA-1, to digital signal processors like the DSP-1 and Cx4, to decompression accelerators like the S-DD1 and SPC7110, to real-time clocks from Sharp and Epson, and more...
That means an SNES emulator needs to be able to handle the instruction and pixel caches of the SuperFX; the memory bus conflict arbitrator of the SA-1 (which allowed the SNES CPU and SA-1 to share the same ROM and RAM chips simultaneously); the embedded firmware of the DSP-1 and Cx4; the prediction-based arithmetic coders of the S-DD1 and SPC7110; and the odd BCD (binary-coded decimal) edge cases of the real-time clocks. Slowly but surely, by applying the above techniques to determine correctness and timing, we were able to near-perfectly emulate all of these chips.
It actually took a massive effort and thousands of dollars to decap and extract the programming firmware from the digital signal processors used in various games. In one instance, emulation of the NEC uPD772x led to code from higan being used to save the late professor Stephen Hawking's voice!
3a8082e126