--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/f740c851-aa00-4a0d-bdfb-394a1269e6e5o%40tensorflow.org.
I don't think this is trivial out-of-the-box right now, but I think Sharad (cc'd) is working on it. Stay tuned!
On Sat, Aug 8, 2020 at 5:52 AM 'Sebastian Khan' via TensorFlow Probability <tfprob...@tensorflow.org> wrote:
Hi,--I was just wondering if it's possible to improve the computationally efficiency and also reduce the wall time of an inference problem by utilising more HMC chains by parallelising on multiple GPUs with TFP?Thanks!
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprob...@tensorflow.org.
Hi Sebastian,
We don't have any official resources on how to run different MCMC chains on multiple GPUs just yet, but it is possible! Some of the APIs are still experimental and subject to change. The closest resource we have thus far is our Cross GPU Log Prob notebook in the discussion section, which demonstrates how to partition a GPU into multiple logical devices and distribute a computation over those devices. This notebook doesn't do exactly what you want though (it runs a single MCMC chain with a distributed log prob computation).
It turns out, it's easier than the example in the notebook to run chains in parallel on multiple devices. You first need to create a MirroredStrategy; the notebook shows how to do this but if you have multiple GPUs already you won’t need to do the partitioning step. Then write a function,run_chain for example, that is atf.functionto be executed by the strategy. In the function, you want to run tfp.mcmc.sample_chain.
If we execute the run_chain function using a strategy it will run the exact same function on each device.
What this will do is run as many chains as there are devices each with a different random seed. Each of these chains are initialized identically (via current_state=tf.zeros([]), but if you’d like to initialize them differently, you can use tf.random.normal, or pass a distributed value into strategy.run and run_chain. I personally like passing in a stateless random seed to ensure reproducibility.
strategy.run(run_chain) returns the result of run_chain, but in place of tf.Tensors there will be PerReplicaResults, which are essentially sharded tensors (in this case, one for each device). In order to get a tf.Tensor out of a PerReplicaResults object, you can say per_replica_results.values[i] for some device index i.