Picking a system to emulate isn't an easy choice; the standard first emulator project seems to be a CHIP-8 emulator. Reading about CHIP-8 definitely helped me to understand a lot of emulation concepts, but it seemed a bit too basic. I felt that I got enough out of just reading through other people's emulators, and that writing my own would be a pointless exercise.
While I wouldn't recommend trying to learn about emulation from someone as inexperienced as me, I wanted to outline the process of starting an emulator for the first time, talk about which documents were most helpful to me, and also talk about some of the design choices I went through.
I used my own library, LDFS, to create a window with a valid OpenGL context. A better choice would have been something more standard, and cross-platform, such as SDL, however, I went with LDFS just because I was familiar with it.
I then read through some Game Boy documents to get a better overview of the project. In hindsight, I should have spent much longer doing this so that I wouldn't have to keep looking things up later, but I was excited!
Most of the Game Boy specific code that I wrote in the beginning, such as loading a ROM, was based heavily on other emulators. I looked at how two or three different emulators did it, and then wrote it into Cinoop in my own style. It wasn't worth trying to write code on my own just yet, I needed to have a base to work with first, before I could experiment with doing things my own way.
Different sources can refer to memory regions differently. High RAM is sometimes called Zero Page Memory, Cartridge Data is sometimes just called ROM, and if a document just says RAM, it is usually refering to the Working RAM. Being aware of this is essential when reading through documents written by different people.
While this model is fine for dealing with instructions that access the 8 bit registers individually, what I didn't realise is that often the 8 bit registers are grouped together to form the 16 bit registers: AF, BC, DE, and HL.
This is one of the more unique aspects of my emulator; since most other Game Boy emulators were written in an older C standard, (or an undesirable alternative like C++), they didn't have access to anonymous structs or unions, which meant that they either had to access registers with a messy chain, like gameboy_proc->AF.b.h (this is how GBE accesses register a), or rely on bit operations.
The Game Boy has an 8 bit register which controls if the last operation resulted in zero, an underflow, a nibble overflow, or a byte overflow; refered to as the zero flag, the negative flag, the half carry flag, and the full carry flag, respectively.
One thing that initially tripped me up, is that I wasn't sure if I should update the flags after every instruction, or after just some, and if so, which ones (and how should I store this information). I eventually came across this great piece of documentation which describes in detail which instructions should update the flags, and to what values.
I implemented the NOP instruction, and ran the game again to get the next, unsupported instruction. I continued with this method for a few more instructions, checking each time that the register values displayed by Cinoop matched those in NO$GMB.
I went away from the approach of doing everything directly in an instruction's code, and towards the idea of storing information about each instruction in a structure, so that common tasks could reuse the same code.
For example, rather than having each instruction's code responsible for retrieving its operands, I thought it would make more sense to store the operand lengths in the structure, and reuse the same code to retrieve operands for every instruction.
Running the game now, gave me the message "Unimplemented instruction 0x06 (LD B, 0x00)!". This was all I needed to write the 0x06 instruction, I didn't have to stop and find it in an external program or piece of documentation:
I also considered the idea of storing whether an instruction should update certain flags, in the instruction's structure, but decided against the idea because it would just over complicate the system, and would probably cause it to run slower.
Implementing the CPU wasn't particularly difficult, it was just time consuming. Most instructions are fairly straight forward, and some are identical other than the target register (INC A, and INC B for example). In addition, there are multiple NOPs (LD A, A, LD B, B, etc...).
However, there was one instruction which confused me for a while: DAA, which is used to display the score in Tetris. After looking at several other emulators, I managed to write my own implementation. One thing to note, is that unlike in the original Z80A CPU, the Half Carry flag is always cleared, which makes it a little bit simpler.
But this still wasn't very easy to use. If there was a mistake in one of my instruction implementations, I would want to run the game step by step so that I could find it. To do this, I created a basic, real time debugger, which I could run alongside NO$GMB to check that the instructions were being executed correctly:
I was also able to add breakpoints in the readByte and writeByte function, which meant that I could activate a realtime, step by step, debugger on any condition I wanted. In combination with being able to run the game in NO$GMB, this turned out to be sufficient for debugging most problems.
Running what seems like an endless list of CPU instructions, to check everything is working, is important, but it becomes dull very quickly. Drawing the tileset would be my first, visual confirmation that Cinoop was actually working well enough to load graphics.
Of course, when I ran the emulator, it would exit before reaching this function since I hadn't implemented enough instructions; but I kept working, and eventually the breakpoint triggered! Then there were a few more instructions to implement to get through the function, before I could dump the first tile:
Cinoop was getting stuck in a loop, and not progressing past the copyright screen. After some brief debugging, I found that blocking any writes to the first byte of HRAM (0xff80) would allow the loop to be completed:
It wasn't my intention to use game specific patches rather than sorting out bugs properly, but I assumed that if I temporarily enabled this patch, the problem would eventually sort its self out as I improved Cinoop (which it did).
I realised that it was probably waiting for the value to be written to by an interrupt, so I dumped the interrupt enable register: 0x09, meaning that both VBlank and Serial interrupts were enabled ((1
The final major addition to the core would be sprite support, which wasn't too difficult to implement after having already implemented background maps. The Game Boy can display up to 40 sprites, with the properties of each stored in a basic OAM table, containing the position, tile number, palette number, flip, and priority.
Tetris relies on reading the divider register, at address 0xff04, as the source of entropy to determine which random block will be next. Usually this register is a timer which increments linearly until either overflown or reset to 0 (by writing any value to it).
I decided to return rand when attempting to read from this value instead. Since this register is only used as a random number generator for Tetris, this change will provide a better source of entropy, without any adverse effects for this game.
I revised the code to use the DS' native tile rendering. It ran faster than the framebuffer renderer, but it had some minor graphical glitches on some tiles (happened on real hardware as well as on emulators):
I'm not exactly sure what caused this, but I think it was because I was writing to VRAM too frequently. I fixed the issue by writing to an array in RAM instead, and setting a dirtyTileset variable. During the DS' VBlank, I checked to see if the tileset had changed, and if so, copied the tiles from RAM into VRAM:
The way that the DS video modes are designed, it is not possible for sprites and backgrounds to share a tileset, like the Game Boy does. As a result of this, I had to copy the tileset into both the background, and sprite memory in the dsVblank routine.
Finally, I wrote a plain black tile into VRAM, and used it for the border. With the finalised tile rendering system in place, the emulator had no graphical artifacts, and ran noticeably faster (but still not full speed).
The GameCube isn't a system that I have ever programmed for, so I decided that this would be a good project to start with. While there isn't nearly as much documentation as there is for the DS, it was still easy to get started with thanks to libogc and devkitPPC.
The GameCube emulator I used for testing, gcube, doesn't support reading files from the memory card, so I had to hardcode the Tetris ROM into the binary, which was another minor annoyance. With all that done, I had Cinoop running on the GameCube!
The first reason for this is because of the decentralised nature of the scene; whatever information about PS2 development that is still available is scattered across the internet like Lego in a child's room.
There's no way to write directly to the frame buffer because the VRAM is not within EE RAM and isn't mapped to EE memory space. The only way is to have a local framebuffer within EE RAM, which you can transfer to the real framebuffer within VRAM via DMA at every frame. The VRAM is actually 4MB of eRAM within the GS.
To understand how this is to be done, it's important to first understand how to upload textures during a program's runtime. The frame buffer is basically a very large texture. Some graphics libraries (e.g. gsKit) have nice functions for facilitating processes like this.
I think that it may be possible to transfer bitmap data directly to the frame buffer on the GS because I think that FMCB (and probably some earlier homebrew demos) does that for drawing its boot logo. But so far I've only transferred bitmap data to the GS as texture data (to a buffer within VRAM which is outside of the frame buffer), which is later drawn onto the frame buffer as a full-screen 2D-texture.
7fc3f7cf58