is it normal that denseflipout produce a different uncertainty estimation compared to concrete dropout?

Skip to first unread message


Aug 25, 2022, 10:46:16 PM8/25/22
to TensorFlow Probability
Hi everyone,
I am a student, and I started studying probabilistic deep learning  with tensorflow probability recently. I tried to create 2 neural networks, one with Denseflipout and one with concrete dropout, for a fairly simple regression problem. The dataset consists of a thousand points (x,y), so the function that the neural network has to reconstruct is a simple one-dimensional function. The neural networks seem to work well but the uncertainty estimation is different for the two cases.

 concrete dropout gives me 
download (1).png
where orange represent mean +- std 

 Denseflipout instead gives me this result 
with a std that grows much bigger compared to what i get with concrete dropout. I don't know if it's a symptom of a poorly trained bnn  or it's natural to obtain different results (in which case which one is the correct one?). I would like to ask to you experts if you had experienced something similar when comparing the results obtained with denseflipout and dropout and what you did. To me it seems that the network trained with dropout is overly optimistic, but i am not sure.

2)I have also noticed that bnns needs to be trained longer then traditional nns, like 40000 epochs, even after if the nll has dropped at 6000. is this normal? 

Matias Valdenegro

Sep 1, 2022, 2:36:47 AM9/1/22

Yes it should be different as Dropout and Flipout work in a completely different way, there is no requirement to have the same uncertainty, and even there is no true or correct uncertainty at all.


Sep 8, 2022, 12:58:52 PM9/8/22
to TensorFlow Probability,
but what is then the sense in making a neural network able to estimate the uncertainty if the uncertainty calculated depends   on the architecture  used rather than the data ? how do i check which one is better?

Matias Valdenegro

Sep 13, 2022, 4:50:17 AM9/13/22
to TensorFlow Probability, P.

Well this is directly the concept of epistemic uncertainty, also known as model uncertainty, the model structure, assumptions, training data, all contribute to the output uncertainty, this is completely expected. The uncertainty quantification method also contributes to this, I have some papers that shows these differences and comparisons:

Also do not forget that all current uncertainty quantification methods are approximations, MC-Dropout and Flipout produce a approximation to the posterior predictive distribution (which is intractable to compute), and different methods produce different approximations, I do not think there is something inherently better than other method, it all depends on the application.

Also there are ways to separate/disentangle data and model uncertainty, see the second paper I linked.

Josh Dillon

Sep 13, 2022, 12:06:20 PM9/13/22
to Matias Valdenegro, Warren Morningstar, TensorFlow Probability, P.
Hi Matias. Thanks for your input and interesting research! I particularly enjoyed your second paper.

In 2204.09308, did you use a KL penalty for the flipout layer? I ask because I didnt see mention of KL penalty in the paper (though I might have missed it).  If the KL penalty was dropped, this would explain why you observed that "Flipout seems to produce zero epistemic uncertainty for
all the selected examples."

(Also if you used TFP may I kindly ask you to cite



You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit


Sep 16, 2022, 5:14:59 PM9/16/22
to TensorFlow Probability,, P.
Hi Mathias, thank you for your answer. i have read the papers you linked and i found it very interesting.  Deep ensembles look interesting but for my resources they are computationally too expensive. I would need to train 1000 thousand times the same model  to get a matrix of 1000x1000 samples and i am using colab. Regarding the plot, I think i can disentangle the data easily with a satellite model that extract the output of the dense layer before it goes in the tfd.Normal layer.
I have however a question relative to the NLL to which I have not yet found an answer and i was wondering if you guys could share with me your thoughts on it:

 in my case  the output values y_true_i  come with  the associated measurement error  y_std_i. I was wondering if it made sense from a bayesian/ML point of view   to pass the  inverse values 1/y_std_i^2  to the loss_weight parameter in the keras compile method to give more weigth to the points that have been measured  better. 
All the examples i have found on the web use the standard NLL for variational inference and it made me doubt about the correctness of the idea.

Also, is it possible to combine this with the B-NLL to achieve better result? i think there is no conflict between the two  but i am quite new to the world of probabilistic machine learning.



Reply all
Reply to author
0 new messages