Can I train a Beta Distribution with neg_log_likelihood?

228 views
Skip to first unread message

Lenhart

unread,
Jun 3, 2021, 8:35:43 PM6/3/21
to tfprob...@tensorflow.org
I've been trying to train a very simple Beta distribution using a neg_log_likelihood loss, but it produces nan value losses instantly.

Here's a code snippet:
neg_log_likelihood = lambda yrv_y-rv_y.log_prob(y)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(2),
    tfp.layers.DistributionLambda(lambda t: tfd.Beta(
        1 + tf.math.softplus(.01 * t[..., 0]), 
        1 + tf.math.softplus(.01 * t[..., 1])))
    # tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t[..., 0], scale=.1))
])
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.001), loss=neg_log_likelihood)

This model can't train. Here are somethings I've tried:
  1. Commenting out the tfp.layers.DistributionLambda used above and replacing it with the one that is commented out. This runs as expected and converges after about 300 epochs. (it's not intended to model my data, I just wanted to check that some tfd worked in the model.
  2. Checking for nan values in training and target data. There are none.
  3. Making sure that the data has been scaled to zero mean and unit variance.
  4. Making sure the target data is on the interval [0,1].
  5. Adjusting various possible values for alpha and beta (the 'concentrations') in the tfd.Beta.
  6. I have set steps_per_epoch to 1 to make sure that the loss is never defined. It is not defined ever after one step.
Anyway, I'm not sure why this doesn't work. Any help or explanations would be greatly appreciated.

Cheers,

Lenhart


Brian Patton 🚀

unread,
Jun 3, 2021, 9:15:43 PM6/3/21
to Lenhart, TensorFlow Probability
Are you certain all the labels are in 0,1?

--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/CAAgGELmVizzvkvYX%3DfPKzL5SgRMv8y0-gXAOnm05n-PwWjowpA%40mail.gmail.com.

Lenhart

unread,
Jun 3, 2021, 10:00:12 PM6/3/21
to Brian Patton 🚀, TensorFlow Probability
Hi Brian,

Thanks for your prompt response. Yes, the values are definitely on the interval [0, 1] inclusive. However, zero values for observations are a problem for the model.

On a lark, as I was sitting at dinner, I decided to try training on y_train + 1e-6. This worked.

I can't decide whether this is a bug or not (need to think about neg_log_lik training like this): the probability of (0| a, b) is not typically undefined, so the way I ordinarily think about a likelihood makes this result feel problematic. Still, I need to think on it some more.


Brian Patton 🚀

unread,
Jun 3, 2021, 10:02:11 PM6/3/21
to Lenhart, TensorFlow Probability
If you change the concentration offset to 1+1e-6+softplus(..) does it give more stable results?

Brian Patton 🚀

unread,
Jun 3, 2021, 10:09:08 PM6/3/21
to Lenhart, TensorFlow Probability
Consider doing a histogram of your overall dataset to see if there are spikes toward the edges (concentration values <1) vs spikes toward the interior (conc >1).

Also consider using the SigmoidBeta distribution in tfp nightly. You'd need to use an tfb.Sigmoid().inverse(.) of your label, and perhaps clip the -inf to -big, but that should be ok.

Lenhart

unread,
Jun 4, 2021, 7:52:55 AM6/4/21
to Brian Patton 🚀, TensorFlow Probability
Actually, those offsets were some arbitrary choices I was making while trying to see what I could get that would be stable. Changing those lines to tf.math.softplus(.1* t ...) produces a working model, as long as I increase my training data by a billionth to make zeros tiny. For my purposes, any tiny number will serve the place of zeros just fine.
Reply all
Reply to author
Forward
0 new messages