multiple gpu hamiltonian monte carlo

131 views
Skip to first unread message

Sebastian Khan

unread,
Aug 8, 2020, 8:52:06 AM8/8/20
to TensorFlow Probability
Hi,

I was just wondering if it's possible to improve the computationally efficiency and also reduce the wall time of an inference problem by utilising more HMC chains by parallelising on multiple GPUs with TFP?

Thanks!

rif

unread,
Aug 10, 2020, 1:00:29 AM8/10/20
to Sebastian Khan, TensorFlow Probability, Sharad Vikram
I don't think this is trivial out-of-the-box right now, but I think Sharad (cc'd) is working on it. Stay tuned!

--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/f740c851-aa00-4a0d-bdfb-394a1269e6e5o%40tensorflow.org.

Sebastian Khan

unread,
Aug 10, 2020, 4:53:36 AM8/10/20
to TensorFlow Probability, sebast...@googlemail.com, shar...@google.com
Hi Rif,

Thanks for the reply. This will be a really awesome addition, I will stay tuned!

Best,
Sebastian

On Monday, 10 August 2020 06:00:29 UTC+1, Rif A. Saurous wrote:
I don't think this is trivial out-of-the-box right now, but I think Sharad (cc'd) is working on it. Stay tuned!

On Sat, Aug 8, 2020 at 5:52 AM 'Sebastian Khan' via TensorFlow Probability <tfprob...@tensorflow.org> wrote:
Hi,

I was just wondering if it's possible to improve the computationally efficiency and also reduce the wall time of an inference problem by utilising more HMC chains by parallelising on multiple GPUs with TFP?

Thanks!

--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprob...@tensorflow.org.

Sharad Vikram

unread,
Aug 10, 2020, 3:35:06 PM8/10/20
to Sebastian Khan, TensorFlow Probability

Hi Sebastian,


We don't have any official resources on how to run different MCMC chains on multiple GPUs just yet, but it is possible! Some of the APIs are still experimental and subject to change. The closest resource we have thus far is our Cross GPU Log Prob notebook in the discussion section, which demonstrates how to partition a GPU into multiple logical devices and distribute a computation over those devices. This notebook doesn't do exactly what you want though (it runs a single MCMC chain with a distributed log prob computation).

It turns out, it's easier than the example in the notebook to run chains in parallel on multiple devices. You first need to create a MirroredStrategy
; the notebook shows how to do this but if you have multiple GPUs already you won’t need to do the partitioning step. Then write a function,run_chain for example, that is atf.functionto be executed by the strategy. In the function, you want to run tfp.mcmc.sample_chain.


strategy = tf.distribute.MirroredStrategy(...)


def log_prob(x):

  """Some unnormalized log_prob function."""

  return tfd.Normal(0., 1.).log_prob(x)


@tf.function(autograph=False)

def run_chain():

  kernel = tfp.mcmc.HamiltonianMonteCarlo(log_prob, 1e-1, 10)

  return tfp.mcmc.sample_chain(num_results=200, num_burnin_steps=100,

                               kernel=kernel, current_state=tf.zeros([]))                       


If we execute the run_chain function using a strategy it will run the exact same function on each device.


strategy.run(run_chain)


What this will do is run as many chains as there are devices each with a different random seed. Each of these chains are initialized identically (via current_state=tf.zeros([]), but if you’d like to initialize them differently, you can use tf.random.normal, or pass a distributed value into strategy.run and run_chain. I personally like passing in a stateless random seed to ensure reproducibility.


strategy.run(run_chain) returns the result of run_chain, but in place of tf.Tensors there will be PerReplicaResults, which are essentially sharded tensors (in this case, one for each device). In order to get a tf.Tensor out of a PerReplicaResults object, you can say per_replica_results.values[i] for some device index i.


Final note: If you want to run batches of chains on different devices, you can pass in a batch of states into sample_chain.

Hope this is helpful!

Sharad

Sebastian Khan

unread,
Aug 12, 2020, 2:41:51 AM8/12/20
to Sharad Vikram, TensorFlow Probability
This is amazing thank you so much!
Reply all
Reply to author
Forward
0 new messages