Hi Devs,
I've been playing with OpenCL recently as a means of getting
* lower-overhead CPU execution than Theano currently offers
* multi-core CPU support
* GPU support from same code base as CPU
* GPU support for range of dtypes
Of course, Theano doesn't have OpenCL support yet. One way to add it would be to re-do what was done in the sandbox.cuda folder, but I think I have found a better way: post-process theano.function. This strategy is better because (a) it deals more naturally with OpenCL contexts, (b) it works fine as a separate project from Theano, and (c) a lot can be done with a few lines of code, (d) it solves the mystery of how to pickle compiled functions.
The strategy is essentially to create a new class "Simulator" that runs theano functions:
# -- do standard graph optimizations, don't care about quality of VM
f = theano.function([x], y, linker='py')
# -- Create a VM-like thing *externally* from the function,
# which works by allocating a NEW storage map and creating NEW
# thunks for each of the apply nodes in f's optimized graph
# (of course the simulator can also modify the graph even more)
# The simulator either uses the original shared variables
# or maybe creates copies... up to simulator.
sim = SimulatorOCL(f)
# simulator provides calling mechanism similar to the original function
# (maybe simulator provides other calling protocols too
# e.g. for running N times)
sim(xvalue)
# updates shared variables
sim.sync_to_theano()
The way it solves the pickling issue is that it allows the original theano graph to be just pure numpy, which was always picklable no problem, while still providing a way to evaluate a function really fast on a particular host. In this case, the function `f` can be unserialized anywhere, and the OpenCL-based simulator `sim` can only be un-serialized on hosts that have OpenCL.
I've been developing this mechanism in the context of a Theano port of the nengo brain simulator [1]. Would readers of this list are interested in making this available more generally? The OpenCL simulator could be factored out as a standalone project, or included into theano directly.
- James