New 68K emulator in development

233 views
Skip to first unread message

Joshua Juran

unread,
Jul 29, 2011, 9:03:38 PM7/29/11
to classic...@googlegroups.com
Greetings, folks.

I'm writing a new 68K emulator, called v68k, more or less from
scratch.[1]

Those of us inhabiting this list probably associate the term '68K
emulator' with something that boots up Mac OS when you launch it,
like Mini vMac and Basilisk II. That's not one of my near-term
goals, however.

This project is not an application, but a library. Its users are
programs that wish to run 68K code in a controlled manner. It works
like an embedded scripting language, but the language is 68K machine
code. The client program can run as many simultaneous virtual
machines as it wants, each with different configurations.[2]

A virtual machine requires a memory space, the native address and
size of which is passed on construction. It's up to the caller to
load 68K code into memory and initialize the reset vector, call the
emulator object's reset() method (to simulate power-on), and then
repeatedly call step() until it returns false. (In the future, the
caller will have to query to determine which condition caused a false
return, but for now it means the processor has halted due to an
exception for which support is not implemented (or a double bus
fault, which so far only happens in the pathological case of a zero
memory size, putting the reset vector in unmapped memory).)

Memory is currently flat -- translation of a virtual machine address
to a native address is simply the addition of the memory block's base
address. But all memory accesses are bottlenecked through the
v68k::memory class interface -- replacing it could allow
discontiguous memory segments (e.g. RAM and ROM), memory pages backed
by something other than native memory (for I/O), or pages that forbid
modification or execution.

I have a "Hello world" sample (of sorts) using only the instructions
MOVEP, MOVEQ, and EXG.[3] The client loads 16 bytes of data at byte
offset 1024 and 76 bytes of code at offset 2048. The last
instruction intentionally causes an unmapped memory exception (by
dereferencing an invalid stack pointer), breaking the step() loop and
resulting in this output from the client:

D0: ff0000ff A0: ffffffff
D1: ff0000ff A1: 000ff000
D2: ff0000ff A2: 000ff000
D3: ffffffff A3: 000ff000
D4: ffffffff A4: 000ff000
D5: ff0000ff A5: 000ff000
D6: ff0000ff A6: 000ff000
D7: ff0000ff A7: ffffffff

PC: 0000084c
SR: 2700

One use case for an emulator library is using 68K as an interpreter
for untrusted third-party code -- there are many VMs used for this
already, but they tend to be tied to a specific language. Also, what
if the third-party code wants to run other code that *it* doesn't
trust? In the Javascript world, you're stuck -- you either take the
risk, or call it XSS and refuse. But a 68K VM could do it.

Another possible use for v68k is an application that runs mostly
natively but allows legacy code resources as plugins, such as a
hypothetical HyperCard clone with sourceless XCMDs. In this case,
not booting a full Mac OS makes replacing the ROM (as with Executor)
a much easier task. Leif Strand's work on GrayBox may also help here.

And of course, there's no reason a Mac emulator client couldn't be
written. With proper support in the client, MacRelix could have fork
(), IPv6, native-speed crypto routines, and more.

These scenarios call for a way for emulated code to call service
routines provided by the client. I may use the BKPT instruction for
this, as a sort of jump-out-of-the-system call (probably issued from
a TRAP exception handler). Emulated code would have as much or as
little freedom and power as the client granted to it.

I'm still cleaning up the code in advance of pushing it to GitHub,
but here's a teaser:

void microcode_JSR( registers& regs, const memory& mem, const
uint32_t* params )
{
const uint32_t addr = params[0];

uint32_t& sp = regs.a[7];

sp -= 4;

mem.put_long( sp, regs.pc );

regs.pc = addr;
}

void emulator::reset()
{
const reset_vector* v;

try
{
v = (const reset_vector*) mem.translate( 0 );
}
catch ( ... )
{
double_bus_fault();

return;
}

regs.ttsm = 0 << 2 // clear Trace bits
| 1 << 1 // set Supervisor bit
| 0; // clear Master bit

regs. iii = 7; // set max Interrupt mask

regs. x = 0; // clear CCR
regs.nzvc = 0;

regs.a[7] = longword_from_big( v->isp );
regs.pc = longword_from_big( v->pc );

halted = false;
}

bool emulator::step()
{
if ( halted )
{
return false;
}

try
{
// fetch
regs.op = fetch_instruction_word( regs, mem );

// decode
const instruction* decoded = decode( regs, mem );

if ( !decoded )
{
throw illegal_instruction();
}

// prepare
fetcher* fetch = decoded->fetch;

uint32_t params[ max_params ];

uint32_t* p = params;

while ( *fetch != 0 ) // NULL
{
*p++ = (*fetch++)( regs, mem );
}

// execute
decoded->code( regs, mem, params );
}
catch ( ... )
{
// everything halts the processor for now
halted = true;
}

return !halted;
}

As you can see, each instruction goes through four steps: (1)
fetching the opcode itself, (2) decoding it to select a fetch vector
and microcode, (3) running each fetch routine to collect the
parameters used by the microcode (some of which may read further
instruction words or otherwise access memory), and finally (4) call
the microcode, executing the instruction. Note that instructions may
share either fetch vectors (when different instructions expect to
find e.g. an effective address or register number in the same place)
or microcode (as with LINK and LINK.L, which differ only in how long
the displacement field is).

I'll get the engine and the sample app published this weekend. The
license will probably be GPL v2+, for compatibility with other 68K
emulators.

Cheers,
Josh

[1] I'm using my d68k disassembler code as a reference, but the v68k
code is all new.

[2] Currently, a configuration consists of only the memory size,
unless you count being able to load a different kernel into each
one. Future options will include which processor model to emulate.
Maybe there will be memory sharing between multiple cores.

[3] If you're wondering why I didn't just use MOVE and MOVEA, it's so
I wouldn't have to decode effective addresses yet. They're non-
trivial. :-)


Joshua Juran

unread,
Jul 30, 2011, 10:17:32 AM7/30/11
to classic...@googlegroups.com
On Jul 29, 2011, at 6:03 PM, Joshua Juran wrote:

> I'm writing a new 68K emulator, called v68k, more or less from
> scratch.[1]
>

> I'll get the engine and the sample app published this weekend. The
> license will probably be GPL v2+, for compatibility with other 68K
> emulators.

v68k
https://github.com/jjuran/metamage_1/tree/
522ee26034f74f53dd429dcbe5ca447415466d3f/engines/v68k/v68k

v68k-test.cc
https://github.com/jjuran/metamage_1/blob/
522ee26034f74f53dd429dcbe5ca447415466d3f/tools/posix/v68k-test/v68k-
test.cc

The v68k library has no outside dependencies, and the test program
just needs printf() for output. I haven't tried yet, but this might
even work with MrCpp and SCpp.

Josh


Joshua Juran

unread,
Aug 7, 2011, 1:34:51 AM8/7/11
to classic...@googlegroups.com
On Jul 30, 2011, at 7:17 AM, Joshua Juran wrote:

> On Jul 29, 2011, at 6:03 PM, Joshua Juran wrote:
>
>> I'm writing a new 68K emulator, called v68k, more or less from
>> scratch.[1]

The emulation library now handles a bunch of new instructions[1],
most notably all of MOVE/MOVEA and BRA/BSR/Bcc, as well as JMP/JSR/
RTS, LEA/PEA, and TRAP/RTE. As an extension, STOP #FFFF indicates
successful completion of an emulated program-as-subroutine, in effect
advising the host program that it may destroy the emulation context
and continue.

The sample client loads code that is separated into OS and user code,
and the latter does in fact run in user mode and ends with an RTS
instruction. And yes, it actually prints "Hello world" now. :-)

>> I'll get the engine and the sample app published this weekend.
>> The license will probably be GPL v2+, for compatibility with other
>> 68K emulators.

New page, with branch-relative GitHub links: <http://
www.metamage.com/code/v68k/>

I hope other people than me find this interesting. :-)

Josh

[1] Currently, 15393 of 65536 possible opcodes decode as valid
instructions. There are 33 'microcode' functions, implementing 40
unique instruction decodings.[2]

[2] Instructions like LINK and LINK.L decode as different objects
(since they have different flags and fetch parameters differently)
but they share the same 'execute' implementation (microcode). Also,
MOVE to memory and MOVE to a data register have different microcode.


Reply all
Reply to author
Forward
0 new messages