Linear Regression: Setting different data during training and testing in JointDistributionCoroutineAutoBatched

Skip to first unread message

Nipun Batra

Feb 5, 2022, 5:25:13 AMFeb 5
to TensorFlow Probability
I'm trying to develop a notebook on Bayesian Linear regression solved via VI from scratch. I realise it is fairly easy to do so using `tf.keras` and following TFP tutorial on Probabilistic regression.

I've made good progress in my post here where I have been able to show how to compute and optimise KL-divergence between different families sampling and reparameterisation. I'm stuck at how to 

I then looked at the PCA tutorial on TFP website and adapted it for linear regression here. This implementation uses JointDistributionCoroutineAutoBatched akin to the PCA tutorial. 

The main pieces of the code are:

def lr(x, stddv_datapoints): 
    num_datapoints, data_dim = x.shape 
    b = yield tfd.Normal( loc=0.0, scale=2.0, name="b", ) 
    w = yield tfd.Normal( loc=tf.zeros([data_dim]), 
                                          scale=2.0 * tf.ones([data_dim]), name="w" ) 

  y = yield tfd.Normal( loc=tf.linalg.matvec(x, w) + b, 
                                       scale=stddv_datapoints, name="y" )

Using VI (again code inspired heavily from the PCA post), I am able to get qw_mean, qw_stddv, qb_mean, qb_stddv

I wanted to ask the following:
  1. What is the easiest way to make a prediction on unseen data in my code above? As in, akin to and predict, can we do something here?
  2. What would be the best way for linear regression (from scratch) and not using the tf.keras.*
    1. specify prior (p) on W -- easy 
    2. specify likelihood (l) on data under y ~ N(XW + b)
    3. using surrogate q and optimise ELBO to get as close as possible to p*l
  3. Some examples use tfp.math.minimize and some use tf.GradientTape. When is a particular one recommended over the other.

Reply all
Reply to author
0 new messages