I'm writing an NES emulator for fun. In my main 6502 emulation loop I
need to dispatch on an 8-bit opcode to instruction-specific handlers.
There seems to be two common approaches:
1. Use a switch/case statement, i.e.,
switch(read_opcode(pc++)) {
case <instruction 1>: ...
case <instruction 2>: ...
}
2. Use a table of function pointers to handlers, i.e.,
typedef void (*instruction_handler)();
instruction_handler instruction_handlers[0x100] = { ... };
...
instruction_handlers[read_opcode(pc++)]();
Which approach is likely to generate the most efficient code with GCC
on an ARM Android device? Does the fact that the code needs to be
relocatable come in in any way?
I realize I should profile to figure out if CPU emulation is actually
worth optimizing compared to other things (from what I've heard, it's
usually negligible), but it's a fun topic anyway.
/Ulf
Use an array of label pointers as a jump table. It will be faster than
calling functions and even slightly faster than the switch/case (much
faster, if the number of opcodes is large).
See http://gcc.gnu.org/onlinedocs/gcc-4.4.6/gcc/Labels-as-Values.html#Labels-as-Values
--
"The flames are all long gone, but the pain lingers on"
How does that differ from the kind of jump table the compiler might
generate for a switch?
I'm planning on having this compile with at least VC++ as well by the
way, so might be messy to use GCC extensions :/
/Ulf
switch tends to be compiled to be quite optimal -- if you've got 100%
coverage of a region, it probably is just as fast as the array of jump
labels.
Unfortunately, the only way to find out would be to benchmark actual code.
If I were doing it, and I wanted it to be as fast as possible, I'd code
it all three ways using some a generative programming technique -- you
could do it using C macros, for instance. All of the code that DOES
anything would live in short functions with an optional "inline"
attribute, so that GCC/MSVC can inline them into the switch statement,
or keep them as distinct functions in the case of the function pointer
solution.
But honestly, I'd probably not optimize this until I knew there was an
issue -- and I trust switch enough to let the compiler pick the best way
to optimize it. You can always move to a different approach later if you
find you need additional speed -- especially if the switch is something
that could pretty trivially be generated from a list of opcode macros
(each function that DOES something shares a name with the opcode macro,
for instance).
Tim
Yeah, that's the way I've tried to arrange things, with core
instruction logic in separate inline functions, and then some macro
magic to generate versions for all the different addressing modes.
Should be fairly simple to try out different approaches later on.
/Ulf
That will depend on the compiler. Last time I checked (GCC 3.x,
cross-compiling for MIPS and Coldfire processors) the hand crafted
version was better. GCC probably generates better object code now.
> I'm planning on having this compile with at least VC++ as well by the
> way, so might be messy to use GCC extensions :/
You can use preprocessor macros and conditional compilation here.