Our experience as a TFP case study

201 views
Skip to first unread message

J C

unread,
Jul 26, 2020, 3:37:05 PM7/26/20
to TensorFlow Probability

I'm sorry in advance if a posting like this is not welcome on this forum. I wanted to thank the developers of tensorflow-probability for the great tool that they have developed. The team that I'm on leaned on tfp as the workhorse behind our entry in the CMS AI challenge. This challenge is on predicting hospital readmission and other adverse events using medicare billing data. Over the past year or so, I, and the rest of the team, have had a bit of a crash course in using tfp/tensorflow. I thought I would share our experience as it might be useful as a case study.


First some background - Our team is composed mostly of people who are most-accurately classified as Applied Mathematicians. We do not come from the machine learning world though most of us had worked on data-driven projects in the past. Over a year ago, we started putting together an entry for the CMS AI Challenge based on replicating some properties of artificial neural networks in more-structured Bayesian hierarchical models, for building expressive yet fully-interpretable models. Our proposal was one of the 25 chosen to compete in the contest - of the 25, our team was the only non-entity (mederrata, we still only barely exist).


Our solution necessitated a scalable and flexible modeling framework. I'm a big fan of Stan personally, but I didn't think it was flexible enough for our purposes. Additionally, Stan's implementation of ADVI is a bit of an uncustomizable black box. Finally, I wanted to find a solution that was fully Pythonic. So, I went looking for other frameworks and came across PyMC and eventually TFP because of PyMC4.


So, we adapted TFP for our project - there were some hiccups along the way. A lot of our difficulties related to the change over between TF1.x and 2.x and the documentation for TFP/TF being sort of a mess. However, I greatly enjoy the elegance of the TFP approach (despite it being quite verbose). I also like how easy it is to customize variational approximations, by simply creating a JointDistribution object of any desired structure.


I like TFP overall but I think it is worthwhile to comment on where things got hairy:

  1. Saving models: This was probably the biggest headache for us. At the commencement of the contest we had to send a saved model to the organizers. We thought it would be as easy as inheriting tf.Module and using tensorflow's saved model capabilities. We were wrong. We relied a lot on using JointDistributionNamed objects and these objects completely refused to be saved. In order to get our model object to save at all, I had to exclude all JointDistributionNamed objects using NoDependency which in essence meant that our model wasn't saved. We started developing the project before JointDistributionCoroutine was included in TFP - no idea if that has the same issue. Additionally, other non-tensorflow attributes such as lists and dictionaries with model attributes didn't get saved when using saved model. Eventually, we developed our own serialization using pickle.

  2. Distributed Programming: We were able to get this working though we had to modify the TransformedVariable class to pass in the name scope used for MirrorStrategy After training, we ran into the saved model issue still.

  3. Some basic operations such as bucketizing: The issue here is with TF API documentation and not TFP. We often had to scour StackExchange to figure out how to do various things - many common operations are not present in the official documentation, or easily accessible within the API (for instance having to use things from math_ops).

Anyways, I am very thankful to the developers and also to the user community who very quickly answered questions that I had posted on this forum. We couldn't have developed our contest entry without this tool and I look forward to using TFP in the future. 

rif

unread,
Jul 27, 2020, 11:01:02 AM7/27/20
to J C, David Smalling, Paige Bailey, TensorFlow Probability, Colin Carroll
+David Smalling +Paige Bailey +Colin Carroll 

Super interesting. David or Paige or Colin, any thoughts on how we can best take advantage of Josh's awesome feedback and work?

rif


--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/6be10f50-d48c-4bfd-884f-f891343aeab6o%40tensorflow.org.

Paige Bailey

unread,
Jul 27, 2020, 11:30:48 AM7/27/20
to rif, Kathy Wu, Allen Lavoie, Priya Gupta, Billy Lamberta, J C, David Smalling, TensorFlow Probability, Colin Carroll
Adding in +Kathy Wu and +Allen Lavoie for SavedModel; +Priya Gupta for tf.distribute; +Billy Lamberta for TF docs.

Thank you for your detailed product feedback, Josh, and for sharing your experiences via the TFP mailing list. For the items listed above:
  • Josh, could you share a pointer to Github issues describing challenges with TFP + SavedModel, so we can migrate them to the TensorFlow Github repo?
  • Priya, do you know of any pointers for using the TransformedVariable class with MirroredStrategy?
  • Josh, could you list which math_ops, specifically, were not present in TF documentation?
Thank you again, and have a great week,
.pb
--

Paige Bailey   

Product Manager (TensorFlow)

@DynamicWebPaige

webp...@google.com


 

rif

unread,
Jul 27, 2020, 11:33:31 AM7/27/20
to Paige Bailey, Kathy Wu, Allen Lavoie, Priya Gupta, Billy Lamberta, J C, David Smalling, TensorFlow Probability, Colin Carroll, Jacob Burnim
+Jacob Burnim 

Paige, FWIW, my belief is that the SavedModel issues are going to be at the TFP level rather than the TF level. SavedModels are saving graphs, and JointDistributions are Python-level constructs. Jacob can you comment?

rif

J C

unread,
Jul 27, 2020, 11:38:15 AM7/27/20
to TensorFlow Probability, josh.colu...@gmail.com, smal...@google.com, webp...@google.com, colca...@google.com
There are a  few general purpose ADVI-related features that I think would be useful in TFP. We  have hacky versions of the following:

1) Transformation of Cauchy/t-distributions/some other difficult distributions to their auxiliary representations  in JointDistribution objects, a-la: https://projecteuclid.org/download/pdf_1/euclid.ba/1339616546
2) Automatically building Normal-InverseGamma approximations when calling build build_surrogate_posterior

Also some niceties that I think can help parse out model components:

Naming the variables output in build_surrogate_posterior so that one can easily relate them to the original model variables by name.

I haven't really looked  through the tfp API since February when the model building stage of the contest ended so I'm not sure what the state of ADVI is currently. However, having more than just transformed normal ADVI will blow Stan out of the water.

Also, this is unrelated but I was working on a Covid project sometime back https://www.medrxiv.org/content/10.1101/2020.04.29.20083485v1

We used Stan for this project but I tried to hack together a TFP solution (and I see that there is a similar model now in the TFP documentation). An issue I had in Stan that seemed to be even more of an issue in TFP was the ODE problem, which should not be stiff, becoming stiff for certain poor parameter combinations. This was a big problem for the TFP solution because the ODE integrator is integrating a batch of parameters at once, so chances are that one or more of the ensembles is bad, particularly early on - so DOPRI gets stuck. It would be nice to sacrifice DOPRI adaptively somehow and just do something quick and dirty when the parameter combination is bad. Perhaps the way  to go about  this is to initially solve using Euler or something and then finish off using DOPRI. Regardless, I was just wondering if anybody had thoughts on this issue


On Monday, July 27, 2020 at 11:01:02 AM UTC-4, Rif A. Saurous wrote:
+David Smalling +Paige Bailey +Colin Carroll 

Super interesting. David or Paige or Colin, any thoughts on how we can best take advantage of Josh's awesome feedback and work?

rif


To unsubscribe from this group and stop receiving emails from it, send an email to tfprob...@tensorflow.org.

rif

unread,
Jul 27, 2020, 11:40:21 AM7/27/20
to J C, TensorFlow Probability, David Smalling, Paige Bailey, Colin Carroll, Josh Dillon, Emily Fertig
Thanks JC! Adding +Emily Fertig and +Josh Dillon who are leading some efforts around improved ADVI support. They may want to discuss further?

To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/74464a3c-c378-4785-aed8-61b1e4dd335eo%40tensorflow.org.

J C

unread,
Jul 27, 2020, 11:42:55 AM7/27/20
to TensorFlow Probability, r...@google.com, kat...@google.com, all...@google.com, pri...@google.com, bl...@google.com, josh.colu...@gmail.com, smal...@google.com, colca...@google.com
The quick and dirty hack I used was to modify TransformedVariable like so, so I could pass in a scope

class TransformedVariable(tfp_util.TransformedVariable):
def __init__(self, initial_value, bijector,
dtype=None, scope=None, name=None, **kwargs):


I'll get back to you guys about the specific tf issues.

For serialization, what I eventually did was within __getstate__ I dumped all Tensors/Variables to numpy. Within __setstate__ Icalled a function that regenerated all my JointDistributionNamed objects and assigned the values to the numpy values exported in __getstate__
To unsubscribe from this group and stop receiving emails from it, send an email to tfprob...@tensorflow.org.

J C

unread,
Jul 27, 2020, 11:53:40 AM7/27/20
to TensorFlow Probability, rif, kat...@google.com, all...@google.com, pri...@google.com, bl...@google.com, David Smalling, colca...@google.com
With respect  to SavedModel, it seems that things that aren't tf.Variable don't get saved - this was a big problem for us - mostly because we didn't know that this was the case. The documentation should be clear  about this fact (maybe it is now, I haven't checked recently). I tried wrapping a lot  of the model attributes in tf.Variable but then that just broke things here and there because at  places we relied on things  being lists etc. So this is why it was easier just to serialize using our custom __getstate__ and __setstate__ via numpy arrays.
--
Warm regards

Josh Dillon

unread,
Jul 27, 2020, 12:14:34 PM7/27/20
to J C, TensorFlow Probability, rif, Kathy Wu, Allen Lavoie, Priya Gupta, Billy Lamberta, David Smalling, Colin Carroll
Hi Josh--thanks for your feedback!

1) Re: saving models: I think it'd be both advantageous and straightforward for us to implement __setstate__ and __getstate__ in Distributions, Bijectors, and DeferredTensor.

2) Re: distributed programming: I think your change would be a great addition to TransformedVariable (and possible DeferredTensor).

3) Re: bucketizing: this is generally a bit of a pain in TF but Im wondering if youve come across:
  • tfp.stats.find_bins
  • tfp.stats.histogram
  • tfp.stats.count_integers
  • tfp.stats.percentile
  • tfp.stats.quantiles
Could some of these functions be useful?

To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/CAD_VcXnQjHFEWNWbvkixZYZNFiBAkiJxH8bJeoBPFgaSYrXztQ%40mail.gmail.com.

J C

unread,
Jul 27, 2020, 12:29:13 PM7/27/20
to Josh Dillon, TensorFlow Probability, rif, Kathy Wu, Allen Lavoie, Priya Gupta, Billy Lamberta, David Smalling, Colin Carroll
3) Thank you so much - I don't think all of those existed back in December/January though I may be wrong. I think I looked into quantiles actually. The specific need we had was to be able to ignore NaN/infty values when running stats. Some of  the numpy versions of these functions do this - for us, it was a quick and easy way to get around zero inflation - by setting 0 to NaN and doing quantiles within numpy. So the addition of NaN handling would be useful (if it hasn't been included since about February)...

For bucketization and one-hotting we used  a  lot of  calls like: tf.one_hot(gen_math_ops.bucketize(...))

That was a bit of a pain to find the first time around!
--
Warm regards
Reply all
Reply to author
Forward
0 new messages