We finally released the source code today under the BSD license. The
emulator implements the PTX virtual machine and executes programs
using a single CPU thread, one instruction at a time. We have verified
that all of the CUDA SDK examples from 2.1 and 2.2 run using the
emulator except for the programs that use the Driver Level API, which
we do not support. Like Barra and GPGPU-sim, we provide a set of
libraries that replace libcudart.so, so you should be able to link any
CUDA program against the emulator and have it transparently replace
the NVIDIA driver and runtime.
The emulator has hooks for trace generators that can examine the
complete system state after each instruction is executed. We have
several trace generators to record all memory traffic and inter-thread
communication through shared memory in place already and it should be
fairly easy to add others.
We also release a set of program analysis tools for PTX that allows
you to generate control flow graphs, dominator trees, dataflow graphs,
and convert PTX to pure SSA form as part of the code base.
The entire project can be downloaded here
http://gpuocelot.googlecode.com/files/ocelot-0.4.36.tar.gz
. API documentation can be found here:
http://www.gdiamos.net/classes/translator/api/index.html.
Finally, we have put together a quick tutorial for running a CUDA
program on the emulator:
http://code.google.com/p/gpuocelot/wiki/Installation
.
We plan to continue to develop this project with the goal of
eventually having a complete compilation chain from CUDA for x86 CPUs
as well as NVIDIA GPUs as well as analysis tools supporting each path.