Can't get the Split() bijector to work

Tanmoy Sanyal

unread,

Oct 25, 2021, 3:50:56 PM10/25/21

to TensorFlow Probability

Hi,

I am trying to refactor a model in tf probability, where instead of constructing two tf.Variables of shape (3,3) and (3,4) I want to start with a single tf.Variable of shape (3,7) and use Split([3,4]) bijector to get these two splits. There's not really a good reason to do this except aesthetics, but I got intrigued since using the Split() bijector in this mode produces a weird error that I didn't quite understand. Here's a MWE:

bijector = tfb.Split([3,4])

x = tf.Variable(np.random.random((3,7)), name="x")

x1 = bijector.forward(x)

y = tfp.util.TransformedVariable(x1, bijector=bijector, name="y")

print(y)

and error log:

So, I guess I don't really understand how this bijector is intended to work in tfp.util.TransformedVariable applications. Any help is appreciated.

Christopher Suter

unread,

Oct 25, 2021, 3:54:33 PM10/25/21

to Tanmoy Sanyal, TensorFlow Probability

Ah, there's a very strong chance that DeferredTensor and TransformedVariable, which predate "multi-part bijectors" like Split, have not been updated to handle them properly. My instinct is that a fix will be somewhat involved, since MPB was a huge effort and DT/TV are a bit subtle in and of themselves.

--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/43432de7-ba32-4891-9907-7e2f44030c5bn%40tensorflow.org.

Christopher Suter

unread,

Oct 25, 2021, 3:55:40 PM10/25/21

to Tanmoy Sanyal, TensorFlow Probability

(this is not to say it shouldn't be fixed! it just might be a bit of a project (I may also be wrong about this!))

Dave Moore

unread,

Nov 7, 2021, 5:36:54 PM11/7/21

to TensorFlow Probability, Christopher Suter, TensorFlow Probability, tanmo...@gmail.com

Chris is right on the main point that TransformedVariable is unfortunately Tensor-only, since it (and its parent DeferredTensor) operate by registering a custom Tensor conversion function; deferring multipart transformations would be cool but is probably nontrivial.

I just want to point out another potential confusion: tfp.util.TransformedVariable is intended as a *substitute* for tf.Variable, not a wrapper, so it's not necessary to construct your own variable prior to initializing it. That is, you generally want to just write something like this:

```

bijector = tfb.Exp() # Bijector must have single-tensor input/output.

x1 = bijector.forward(np.random.random((3,7))

y = tfp.util.TransformedVariable(x1, bijector=bijector, name="y")

```

rather than creating your own variable as you did in your example code. (It's understandable that you'd assume that TransformedVariable transforms an existing Variable, for example by analogy with TransformedDistribution, which transforms an existing Distribution, but for TransformedVariable the goal was to make it easy to define and initialize variables in constrained space without ever having to interact with the underlying representation. If you *do* have an existing Variable sitting around, the right way to transform it without creating a new variable would be `tfp.util.DeferredTensor(variable, bijector.forward)`.)

Best,

Dave

Dave Moore

unread,

Nov 7, 2021, 5:52:44 PM11/7/21

to TensorFlow Probability, Dave Moore, Christopher Suter, TensorFlow Probability, tanmo...@gmail.com

You might also want to look at DeferredModule (https://www.tensorflow.org/probability/api_docs/python/tfp/experimental/util/DeferredModule), which lets you defer complex/multipart transformations by writing the function that determines all parameters a distribution or bijector given one or more input variables. For example, if for some reason you wanted to parameterize the location and scale of a Normal distribution by splitting an underlying variable, you could write

```

x = tf.Variable(np.random.randn(2), name='x')

def loc_scale_fn(x):

loc, unconstrained_scale = tf.split(x, 2, axis=-1)

return {'loc': loc, 'scale': tf.nn.softplus(unconstrained_scale)}

dist = tfp.experimental.util.DeferredModule(tfd.Normal, loc_scale_fn, x)

```

Here the resulting `dist` object behaves like a tfd.Normal instance that re-initializes itself using the `loc_scale_fn` on every method call, so there's always a gradient path back to the underlying variable. Note that there's no need for `loc_scale_fn` to be a bijection, since you're explicitly providing the underlying variable `x`.

Dave

Reply all

Reply to author

Forward