Porting a multilevel model from Stan

Theo Rashid

unread,

Feb 26, 2022, 5:09:49 PM2/26/22

to TensorFlow Probability

Hi all,

Nice to join a new group. I'm new to TFP (and tensorflow generally) and thought I'd start by porting a model I wrote in Stan. I'm getting some issues and I'm not used to the error messages, but I'm sure this has a quick fix. Any help is appreciated.

The model is a multilevel model, so I adapted the multilevel primer as best I could. The model predicts the results of football matches using the `attack` and `defend` strengths of each team.

For now, I've commented out the final two terms, which I'm sure have their own shape issues, but there are problems earlier in the model. I write the model as:
```

def model(home_id, away_id):
  return tfd.JointDistributionSequential(
    [
      tfd.Normal(loc=0.0, scale=1.0, name="alpha"),
      tfd.Normal(loc=0.0, scale=1.0, name="home"),
      tfd.HalfStudentT(df=3.0, loc=0.0, scale=2.5, name="sd_att"),
      tfd.HalfStudentT(df=3.0, loc=0.0, scale=2.5, name="sd_def"),
      lambda sd_att: tfd.MultivariateNormalDiag(
        loc=tf.zeros(num_teams),
        scale_identity_multiplier=sd_att,
        name="attack",
      ),
     
      lambda sd_def: tfd.MultivariateNormalDiag(
        loc=tf.zeros(num_teams),
        scale_identity_multiplier=sd_def,
        name="defend",
      ),

      # lambda alpha, home, attack, defend: (
      #   tfd.Independent(
      #     tfd.Poisson(
      #       rate=(
      #         alpha
      #         + home
      #         + tf.gather(attack, home_id, axis=-1)
      #         - tf.gather(defend, away_id, axis=-1) 
      #       ),
      #       name="s1"
      #     ),
      #     reinterpreted_batch_ndims=1,
      #   )
      # ),
     
      # lambda alpha, attack, defend: (
      #   tfd.Independent(
      #     tfd.Poisson(
      #       rate=(
      #         alpha +
      #         tf.gather(attack, away_id, axis=-1)
      #         - tf.gather(defend, home_id, axis=-1) 
      #       ),
      #       name="s2"
      #     ),
      #     reinterpreted_batch_ndims=1,
      #   )
      # ),
  ]
)

```
When I run `model(home_id, away_id).sample()`, the code executes but:

There is a warning `WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.`. This no longer occurs when I replace the `HalfStudentT` with a `HalfNormal`. Not sure why?
The 5th distribution (first of the lambda statements) has shape (20,). The 6th has shape (20, 20). Again, I do not know why

The main issue is when I run `model(home_id, away_id).resolve_graph()`. It returns `Inconsistent names: component with name "sd_def" was referred to by a different name "sd_att".` I do not understand this because I have clearly only named sd_def once, and the distribution below is named attack. Any ideas?

Appreciate all the help. I'm sure this is a common error and a quick fix. Feel free to point me to any implementations of this model.

Best,

Theo

Christopher Suter

unread,

Feb 26, 2022, 5:35:43 PM2/26/22

to Theo Rashid, TensorFlow Probability

In a sequential JD specification, the lambdas' arguments are fed in from the preceding entries in reverse order. So your second lambda is getting the output of your first lambda as input. You can reorder them, so they immediately follow the upstream distributions, or keep the order the same and add some dummy arguments to "hop over" the ones you don't need.

--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/cf6835d8-fd8c-4b34-8b51-3c8f04f94ad4n%40tensorflow.org.

Theo Rashid

unread,

Feb 26, 2022, 5:54:17 PM2/26/22

to TensorFlow Probability, Christopher Suter, TensorFlow Probability, Theo Rashid

Hi Christopher,

Thanks for the quick reply. That seems a strange way of doing things – leading to a lot of redundant variables in the lambda functions. Anyway, I got the graph working with the expected dimensions (see below). However, I'm now getting errors when I try and sample.
```

def model(home_id, away_id):
  """Creates a joint distribution for the partial pooling model."""

  return tfd.JointDistributionSequential(
    [
      tfd.Normal(loc=0.0, scale=1.0, name="alpha"),
      tfd.Normal(loc=0.0, scale=1.0, name="home"),
      tfd.HalfStudentT(df=3.0, loc=0.0, scale=2.5, name="sd_att"),

      lambda sd_att: tfd.MultivariateNormalDiag(
        loc=tf.zeros(num_teams),
        scale_identity_multiplier=sd_att,
        name="attack",
      ),
      tfd.HalfStudentT(df=3.0, loc=0.0, scale=2.5, name="sd_def"),
      lambda sd_def: tfd.MultivariateNormalDiag(
        loc=tf.zeros(num_teams),
        scale_identity_multiplier=sd_def,
        name="defend",
      ),

      lambda defend, sd_def, attack, sd_att, home, alpha: (
        tfd.Independent(
          tfd.Poisson(
            rate=(
              tf.exp(
                  alpha
                  + home
                  + tf.gather(attack, home_id, axis=-1)
                  - tf.gather(defend, away_id, axis=-1) 
              )
            ),
            name="s1"
          ),
          reinterpreted_batch_ndims=1,
        )
      ),
     
      lambda s1, defend, sd_def, attack, sd_att, home, alpha: (
        tfd.Independent(
          tfd.Poisson(
            rate=(
              tf.exp(
                  alpha +
                  tf.gather(attack, away_id, axis=-1)
                  - tf.gather(defend, home_id, axis=-1) 
              )
            ),
            name="s2"
          ),
          reinterpreted_batch_ndims=1,
        )
      ),
  ]
)
```
with an output on `model(home_id, away_id).resolve_graph()` of
```
(('alpha', ()),
 ('home', ()),
 ('sd_att', ()),
 ('attack', ('sd_att',)),
 ('sd_def', ()),
 ('defend', ('sd_def',)),
 ('s1', ('defend', 'sd_def', 'attack', 'sd_att', 'home', 'alpha')),
 ('x', ('s1', 'defend', 'sd_def', 'attack', 'sd_att', 'home', 'alpha')))
```
All seems good! Although not sure why the last distribution is called `x` and not `s2`.

However, I'm getting errors now when I try and sample. Code for sampling:
```
@tf.function
def target_log_prob(alpha, home, sd_att, attack, sd_def, defend):
  """Computes joint log prob pinned at `s1` and `s2`."""
  return model(home_id, away_id).log_prob(
    [alpha, home, sd_att, attack, sd_def, defend, s1, s2]
  )

@tf.function(autograph=False, jit_compile=True)
def sample(num_chains, num_results, num_burnin_steps):
  """Samples from the partial pooling model."""
  hmc = tfp.mcmc.HamiltonianMonteCarlo(
    target_log_prob_fn=target_log_prob,
    num_leapfrog_steps=10,
    step_size=0.01
  )

  initial_state = [
    tf.zeros([num_chains], name='init_alpha'),
    tf.zeros([num_chains], name='init_home'),
    tf.ones([num_chains], name='init_sd_att'),
    tf.zeros([num_chains, num_teams], name='init_attack'),
    tf.ones([num_chains], name='init_sd_def'),
    tf.zeros([num_chains, num_teams], name='init_defend'),
  ]

  unconstraining_bijectors = [
      tfp.bijectors.Identity(),  # alpha
      tfp.bijectors.Identity(),  # home
      tfp.bijectors.Exp(),       # sd_att
      tfp.bijectors.Identity(),  # attack
      tfp.bijectors.Exp(),       # sd_def
      tfp.bijectors.Identity(),  # defend
  ]

  kernel = tfp.mcmc.TransformedTransitionKernel(
    inner_kernel=hmc,
    bijector=unconstraining_bijectors
  )
  
  samples, kernel_results = tfp.mcmc.sample_chain(
    num_results=num_results,
    num_burnin_steps=num_burnin_steps,
    current_state=initial_state,
    kernel=kernel
  )

  return samples
samples = sample(num_chains=4, num_results=1000, num_burnin_steps=1000)
```

But I get the error `Dimensions must be equal, but are 4 and 330 for '{{node JointDistributionSequential/log_prob/add_1}} = AddV2[T=DT_FLOAT](JointDistributionSequential/log_prob/add, JointDistributionSequential/log_prob/GatherV2)' with input shapes: [4], [4,330].` Obviously 4 is the number of chains, and 330 is the number of games (`len(home_id)`=`len(away_id)`).  I'm not sure where I introduced this error though.

Best,

Theo

Christopher Suter

unread,

Feb 26, 2022, 6:47:43 PM2/26/22

to Theo Rashid, TensorFlow Probability

Hrm. Gather in presence of extra (chain) dims can be tricky to get right. Can you try JointDistributionSequentialAutoBatched and see if it magically works? if not we can dive into getting the batching semantics right.

Jeffrey Pollock

unread,

Feb 27, 2022, 5:21:15 AM2/27/22

to Theo Rashid, TensorFlow Probability

Hi Theo, if at all interesting, I coded up a very similar soccer model a while back using TFP, you can see it here:

https://jeffpollock9.github.io/almost-always-auto-batched/

Cheers,

Jeff

--

Theo Rashid

unread,

Feb 27, 2022, 4:33:54 PM2/27/22

to TensorFlow Probability, jeffpo...@gmail.com, TensorFlow Probability, Theo Rashid

Hi both,

JointDistributionSequentialAutoBatched worked but it led to some other issues down the line. Having seen Jeff's (thanks) implementation using JointDistributionCoroutineAutoBatched (quite a mouthful), I have decided to plough ahead with that and I've got it all running and converging.

Thanks again to Jeff's blog for showing me to use np.swapaxes for arviz.from_dict.

I have three final questions:

- I'm benchmarking probabilistic programming languages. With other languages, I separate the time it takes the model to compile and the time it takes the chain to run. Is there a way of doing this for tfp?

- Do tfp chains run in parallel by default?

- Is there a way to run mcmc inference on a GPU?

Cheers,

Theo

Christopher Suter

unread,

Feb 28, 2022, 11:42:33 AM2/28/22

to Theo Rashid, TensorFlow Probability, jeffpo...@gmail.com

Thanks for sharing the example Jeff!

Regarding benchmarking, comments inline below.

On Sun, Feb 27, 2022 at 4:33 PM Theo Rashid <theoao...@gmail.com> wrote:

Hi both,

JointDistributionSequentialAutoBatched worked but it led to some other issues down the line. Having seen Jeff's (thanks) implementation using JointDistributionCoroutineAutoBatched (quite a mouthful), I have decided to plough ahead with that and I've got it all running and converging.

Thanks again to Jeff's blog for showing me to use np.swapaxes for arviz.from_dict.

I have three final questions:
- I'm benchmarking probabilistic programming languages. With other languages, I separate the time it takes the model to compile and the time it takes the chain to run. Is there a way of doing this for tfp?

I would probably just run twice and time both. The first run will include compilation time, the second won't. There may be better ways (maybe some others on the list with more benchmarking prowess will chime in).

- Do tfp chains run in parallel by default?

Yep, this is in some sense why we have to think so much about tensor shapes :). By using batch semantics throughout, we get vectorization of the sampling process automatically (we discuss this in the TFP MCMC white paper).

- Is there a way to run mcmc inference on a GPU?

Yep, if you are using TF+TFP on a machine with a GPU, this should "just work".

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/390d4c4a-c1e3-4036-b23d-0f680f7de8e8n%40tensorflow.org.

Theo Rashid

unread,

Feb 28, 2022, 2:04:05 PM2/28/22

to TensorFlow Probability, Christopher Suter, TensorFlow Probability, jeffpo...@gmail.com

Thanks both for your help. I've implemented the model and made a note of the (rough) runtimes compared to other packages. If anyone reading in the future has an ideas on how to tune the model to get the most out of tfp, please get in touch

Reply all

Reply to author

Forward