Apply TFP Switch-point Analysis to Multiple (independent) Users

52 views
Skip to first unread message

Mary Ruth

unread,
Aug 10, 2022, 2:38:33 PM8/10/22
to TensorFlow Probability
Hi All,

I'm new to Tensorflow and TFP and am trying to learn how to apply the final example in  this lesson (text message behavior change example - begins just under the 'What is Lamda' section) across a number of users. For example, lets assume that I have one dataset with 1000 users & their daily text message counts spanning 3 months each: how would I apply the process detailed in the example above (minus the plots) for all users in the dataset and record the 2 highest likely switch-points & the days they took place (for each user)?

Thanks for any help you can give! 

Best, 
MW

Christopher Suter

unread,
Aug 10, 2022, 8:44:40 PM8/10/22
to Mary Ruth, TensorFlow Probability
Tfp is built from the bottom up to support parallel computation of the sort you're describing. I'd recommend starting with the shapes tutorial to get a flavor for it. It should be a pretty good (if slightly challenging) exercise to modify the bayesian hackers example to be "batched" (as we'd call it) across users. Maybe take a look at this example colab and come back with questions?

There may also be some helpful background here:


--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/1933b954-5566-4468-9e60-0b2bd4c1b03cn%40tensorflow.org.

Mary Ruth

unread,
Aug 16, 2022, 1:09:44 PM8/16/22
to TensorFlow Probability, c...@google.com, TensorFlow Probability, Mary Ruth
Thanks so much for the help!

I have reviewed the resources you sent over and they definitely gave me more clarity, but I still have more questions than answers at this point. FYI - I have switched to using this switch point analysis because it uses TFP rather than pymc, but the problem is essentially the same. See attached for the replicated results and my attempt at batching this problem. I've replicated the initial results (from line 95-243) and have attempted to modify the code for a batched version (lines 257 onward). The only real changes I've made to the initial code is the specification of the JointDistributionNamed starting on line 300. I made each of them independent distributions (except the uniform distribution) since they are each dealing with batched data. 

Question 1.a) Do these distribution specifications make sense in this context (line 299-307)? I am missing something in the uniform distribution/poisson distribution because I keep getting an error about the dimensions being unequal (2 vs 30): Dimensions must be equal, but are 30 and 2 for '{{node SelectV2}} = SelectV2[T=DT_FLOAT](Greater, IndependentExponential/sample_1/Reshape, IndependentExponential/sample/Reshape)' with input shapes: [30], [2], [2]. I think I understand why I'm getting the error (because I'm passing 30 to the disaster rate function but the other distributions have a batch_shape = 2) but I'm not quite sure how to fix this. Do you have any thoughts on how to handle this? 

Question 1.b) In the same vein - would this be a case where I should use JointDistributionNamedAutoBatched ? It's unclear to me when you would want to use JointDistributionNamed vs JointDistributionNamedAutoBatched 

Question 2) Would you be able to provide some clarity around the reinterpreted_batch_ndims argument? There doesn't seem to be too much documentation on why you would want to convert batches to events. I believe in this particular case, we want to have batches rather than events - is my thinking off here?

Thank you for any clarity you can provide!

Best,
MW
bayesian_tfp_batched.py

Christopher Suter

unread,
Aug 30, 2022, 10:41:41 AM8/30/22
to Mary Ruth, TensorFlow Probability
Hi Mary, sorry to leave you hanging for a bit. I'll try to lend some clarity below. Sorry it's a bit of a wall of text! Please feel free to send more clarification questions.

On Tue, Aug 16, 2022 at 1:09 PM Mary Ruth <marywe...@gmail.com> wrote:
Thanks so much for the help!

I have reviewed the resources you sent over and they definitely gave me more clarity, but I still have more questions than answers at this point. FYI - I have switched to using this switch point analysis because it uses TFP rather than pymc, but the problem is essentially the same. See attached for the replicated results and my attempt at batching this problem. I've replicated the initial results (from line 95-243) and have attempted to modify the code for a batched version (lines 257 onward). The only real changes I've made to the initial code is the specification of the JointDistributionNamed starting on line 300. I made each of them independent distributions (except the uniform distribution) since they are each dealing with batched data. 

Question 1.a) Do these distribution specifications make sense in this context (line 299-307)? I am missing something in the uniform distribution/poisson distribution because I keep getting an error about the dimensions being unequal (2 vs 30): Dimensions must be equal, but are 30 and 2 for '{{node SelectV2}} = SelectV2[T=DT_FLOAT](Greater, IndependentExponential/sample_1/Reshape, IndependentExponential/sample/Reshape)' with input shapes: [30], [2], [2]. I think I understand why I'm getting the error (because I'm passing 30 to the disaster rate function but the other distributions have a batch_shape = 2) but I'm not quite sure how to fix this. Do you have any thoughts on how to handle this? 
I would write this as follows:

disaster_count = tfd.JointDistributionNamedAutoBatched(dict(
    e=tfd.Exponential(rate=alpha),
    l=tfd.Exponential(rate=alpha),
    s=tfd.Uniform(low=0., high=len(years)),
    d_t=lambda s, l, e: tfd.Independent(
        tfd.Poisson(rate=disaster_rate_fn(np.arange(len(years)), s, l, e)),
        reinterpreted_batch_ndims=1)
))

I've removed the `Independent`s on the priors, but kept it on the likelihood. Why? Forget about batching for a sec. We should think of the model structure being passed to JointDistribution as describing the generative process for a single "sample path". In this case, the generative process is
  • draw one rate each from two Exponential priors
  • draw uniform change point
  • for years 1...N, draw a count from a Poisson with either the earlier or later rate (drawn above).
This expresses the factorization of the joint distribution over all the random variables:
   p(e, l, s, {d_t}_t=1^N) = p(e) p(l) p(s) p(d_1 | e, l, s) p(d_2 | e, l, s) ... p(d_N | e, l, s)

Ok, why the Independent on the Poisson and not the others? When we instantiate the Poisson with a vector of rates, it imbues a "batch shape" -- in this case the shape is the number of years (I think that's 30 in your running example?). Here's the "fundamental law of Distribution batch_shape": when you ask a Distribution with a batch_shape for the log_prob of a single datum, you will get a batch_shape-shaped answer. If my (scalar, say) distribution D has batch_shape [5] and I ask it for D.log_prob(0.), I will get an answer of shape [5], which is the corresponding log_prob at 0 for each of the batch of 5 (presumably differently parameterized) distributions. Note: there's some implicit broadcasting going on here. I'd get the same answer if I asked for D.log_prob(np.zeros(batch_shape)). I could also pass in a different datum to be evaluated for each of the distributions: D.log_prob(np.arange(5)) would pass 0 to the 0th distribution, 1 to the 1th, 2 to the 2th, etc. I'd still get an answer of shape [5]. I could also pass in something with, say, shape [30, 1] (or [30, 5]) and the inner dimension would broadcast against the batch_shape of the distribution -- in each case I'd get back a log_prob result of shape [30, 5]. If you're familiar with numpy broadcasting behavior, this should all feel pretty familiar. The nice thing about having batch_shapes, instead of, say, a Python list of separate Distribution instances, is that the computation will be "vectorized"; modern CPUs and GPUs can perform the same computation on multiple inputs more efficiently than doing them serially, or even in parallel on separate (process) threads (see SIMD).

Independent lets us "reinterpret" a batch of distributions as a single distribution over multivariate samples. Primarily, this means instead of getting batch_shape-shaped log_probs, we'll just get a single log_prob out. The name independent is meant to evoke the factorization structure of a distribution over several independent quantities, like the Poisson bits of the factorization above:
    p(d_1 | e, l, s) p(d_2 | e, l, s) ... p(d_N | e, l, s)    <-- N independent terms, all Poisson, but with different rates

the reinterpreted_batch_ndims argument says how many batch dimensions we should "reinterpret". Often this is just 1, but we don't set it by default (I don't know if there's a safe and sensible way to do this, but I haven't thought about it much).


Question 1.b) In the same vein - would this be a case where I should use JointDistributionNamedAutoBatched ? It's unclear to me when you would want to use JointDistributionNamed vs JointDistributionNamedAutoBatched 
For now, I'd stick with the AutoBatched. For some context...Just as we can batch Distributions, we could have a non-batched Distribution but pass a batch of events to, say, D.log_prob, and get a batch of log_probs. Strictly speaking, when writing a JointDistribution like yours, and not using the AutoBatched variant, we need to think about this potential for batched inputs when writing everything. Usually this means stuff like, instead of just indexing into a variable -- x[i] -- we have to do some annoying tf.gather(x, ...) or, God forbid, tf.gather_nd(x, ...). Or, in your case, maybe we want to consider whether each of 30 years is behind or ahead of a change point -- but if we have a batch of change points in play (2 say), then we end up comparing a shape [30] vector to a shape [2] vector of change points and we get an error like the one you shared. So we'd need to manually reshape things so that broadcasting can make this work, eg by reshaping to [30, 1] and then comparing against the shape [2] change points (hopefully I've not misunderstood your example here too badly...).

AutoBatched JDs handle some of this complexity automagically. There is some corresponding subtly in the case of calling sample(...) with non-trivial sample_shape. AutoBatched JDs are easier to use but may be less computationally efficient. That's a can we may safely kick down the road a bit though...

Question 2) Would you be able to provide some clarity around the reinterpreted_batch_ndims argument? There doesn't seem to be too much documentation on why you would want to convert batches to events. I believe in this particular case, we want to have batches rather than events - is my thinking off here?
You want to have both -- since you have a Poisson likelihood term that takes data over many years (the Poisson) you need that to be an Independent(Poisson(...)). But then you want to consider a batch over *the whole* joint distribution (ie, over many examples of changepoint detection problems, say for many individuals). The joint distribution should be able to provide separate log_probs for each parallel problem, but within each of those, the log probs of the Poisson observations should be summed. Hope this makes sense...

Mary Ruth

unread,
Sep 19, 2022, 3:58:32 PM9/19/22
to TensorFlow Probability, c...@google.com, TensorFlow Probability, Mary Ruth
Thank you for your help! I'm getting more familiar with this language and your explanations have been wonderful.

I have one more (possibly obvious) question: I think I understand how we get the rate distributions, create a sample based on the prior distribution & the 1/mu parameter estimation and determine if that is within the bounds of the data- but what is being sampled to get the switchpoint distributions/probabilities? I understand that we use a uniform prior distribution and ultimately get a distribution of days that could be the switchpoint (getting the probability by dividing the number instances that day occurred in the sample by the total samples), but I do not understand what is used to create the distribution of days... Can you provide some clarity here?

Thank you,

MW

Christopher Suter

unread,
Sep 19, 2022, 4:23:23 PM9/19/22
to Mary Ruth, TensorFlow Probability
Sorry, I haven't had my head in this for a few weeks and am answering without getting my head entirely back in it, but I *think* the answer to your question is the MCMC sampler?

Basically, we write a(n unnormalized) target log prob function (lines 315-316), which fixes the observations but accepts the latent parameters as input and returns the joint density of those latents. This function is handed off to a sampler (lines 326-357), which returns a bunch of samples of the latent quantities of interest through the magic of MCMC.
Reply all
Reply to author
Forward
0 new messages