Are Bijectors not supported in JointDistributions?

Skip to first unread message


Jun 8, 2021, 6:58:10 PM6/8/21
to TensorFlow Probability
The following code:
example = tfd.JointDistributionSequential([
    lambda m: tfb.Exp()(m),])

Results in this error:

AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'sample'
The expected result is the same as this code:
m1 = tfd.Normal(1,1)

which produces a TransformedDistributions object.

I have checked and double checked this sample code. It's possible I have missed something, but I don't understand what, if I have.



Christopher Suter

Jun 8, 2021, 8:08:20 PM6/8/21
to Lenhart, TensorFlow Probability
The lambda is receiving a sample from the previous distribution, not the distribution itself. So what's happening there is `m` is a sample from the first normal, the bijector __call__ sees a tensor and just returns bijector.forward(m) (as opposed to a TransformedDistribution instance), and then the JD internals try to call `sample` on the return value of the lambda, which is just a(n Eager)Tensor.

If all you want is tfb.Exp(m1), as in your second (expected) code, then I think you don't need a joint distribution at all. If what you want is two distributions, m1 = tfd.Normal(1, 1) and tfb.Exp(m1), this would do it:

tfd.JointDistributionSequential([tfd.Normal(1, 1), tfb.Exp()(tfd.Normal(1, 1))])

If what you want is a joint distribution over (x, y) where x ~ Normal(1, 1) and y deterministically equals exp(x), you could do

    tfd.Normal(1, 1),
    lambda m: tfd.Deterministic(tf.math.exp(m))

Maybe you can clarify a bit what you're after? Hope this helps clarify what's happening, in any case!

You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit


Jun 8, 2021, 8:44:19 PM6/8/21
to Christopher Suter, TensorFlow Probability
Thanks for your quick reply.
I'm trying to get from some mathematical model descriptions into a joint probability distribution, to define a collection of priors and a likelihood for conditioning a posterior. I'm really new to TFP and the first model I tried to translate resulted in the error described above, so I was trying to figure out what what happening with a simplified model. Most of what I know about this sort of thing comes from Richard McElreath's Statistical Rethinking, which is written in R, but a terrific book. In that book, he provides model definitions like this (p. 97, if you happen to have it):
h_i ~ Normal(mu_i, sigma)
mu_i = a +  b(x_i - x-mean)
a ~ Normal(178, 20)
b ~ Lognorm(0,1)
sigma ~ Uniform(0, 50)
And in his book, this sort of thing translates directly into R code, much like the code given in some joint distribution TFP examples. But none of those examples use bijectors, which I was thinking were the appropriate way to pass distributions through a joint distribution, along the lines of the second line of the model above. The model I'm actually working on is an attempt to model a probability that should be a monotonically increasing function of a variable, so I'm trying to devise a model that learns alpha and beta in a beta distribution given input categorical and continuous features. 

Is there a way to handle components of a model definition like the second line of the model definition above with TFP?

Thanks in advance for any advice or instruction you have time for.


Christopher Suter

Jun 9, 2021, 12:36:30 AM6/9/21
to Lenhart, TensorFlow Probability
Have a look at this notebook:

It covers a simpler but similar model. The notebook is mostly focused on tensor shape subtleties in joint distributions, and how the “autobatched” variants make for easier use. So if you’re just getting started and want to connect McElreath book models to TFP this will be a valuable intro I think.

Please come back with more questions after you’ve had a read of that!

NB: the model as you’ve written it is in reverse of how our JD’s would have you write it. It’s in what I call “where” style (“a is normal(b, 1) where b is normal (0, 10)”, etc) vs the (IMO) more “programmatic” sequential style (“to get a and b, first sample b from normal(0, 10), then sample a from normal(b, 1)”). 
Reply all
Reply to author
0 new messages