Feature Inclusion Probabilities for Structural Time Series models

274 views
Skip to first unread message

Matthew Parker

unread,
Apr 9, 2020, 3:56:19 PM4/9/20
to TensorFlow Probability
Hi everyone,

I'm just learning about structural time series and would appreciate any help you can give. 

I have a time series data set that includes hundreds of potential input features containing daily measurements spanning almost a year. I need to build a structural time series model and also perform feature selection based on inclusion probability, while also returning the beta coefficients and the inclusion probabilities for every feature. 

I have tried to follow the example from a TensorFlow Probability blog post (post, github repo), adapting the `temperature_effect` portion of the `build_model()` function to reflect the structure suggested by the documentation for `tfp.sts.SparseLinearRegression()`.

def build_model(observed_time_series, df_of_inputs, list_of_input_cols):
   
...
    features_effect
= sts.SparseLinearRegression(   ### made this Sparse b/c expect most weights to be 0
        design_matrix
= tf.stack([df_of_inputs[k] for k in list_of_input_cols], axis=-1),  
        weights_prior_scale
= 0.1,                  ### probability a feature has non-zero weight??
        name
= 'features_effect')
   
...
    model
= sts.Sum([day_of_week_effect, features_effect, autoregressive],
                    observed_time_series
=observed_time_series)
   
return model

I then build the model, make variational surrogate posteriors, train the model to minimize variational loss, and draw samples from the variational posterior. All of these steps I have done identically to how the github repo's Jupyter notebook works through the process (nothing was changed except variable names). What I can't figure out is how to calculate inclusion probabilities for each feature column? It seems like it's a pretty simple thing to do using the BSTS package for R (see this post), but I'm having trouble trying to adapt it into python. 

So, two questions:
  1. Am I correct in thinking that for the output `features_effect/_weights_noncentered` found in the samples of variational surrogate posteriors (`q_samples_demand_`), the "weights" are equivalent to "beta coefficients"?  I'm new, forgive my ignorance.
  2. How do I calculate/generate/output inclusion probabilities for all of the input features? (Let's assume we expect to keep only 10% of the features)
Thanks in advance for your help!

Dave Moore

unread,
Apr 13, 2020, 3:00:58 PM4/13/20
to Matthew Parker, TensorFlow Probability
Hi Matthew, that's a great question. The STS sparse regression uses a slightly different formulation from BSTS: a horseshoe prior [1] rather than a spike-and-slab prior. For any sampled value of the model parameters, each feature is included with some fraction that the horseshoe prior encourages to be *close* to either zero or one, though rarely exactly either of those values. These fractions are related to the 'shrinkage coefficients' κ_i described in sec. 2.1 of [1]. (specifically, they are 1 - κ_i). In terms of STS parameters 'local_scales_noncentered' and 'local_scale_variances', you would compute

```python
def shrinkage_coefficients(local_scales_noncentered, local_scale_variances):
  local_scales = local_scales_noncentered * tf.sqrt(local_scale_variances)  # λ in [1].
  return 1. / (1. + local_scales ** 2)
feature_inclusion_probabilities = tf.reduce_mean(1. - shrinkage_coefficients, axis=0)
```

where the last line, taking the average fraction of inclusion over the posterior parameter samples, should compute a rough equivalent to an inclusion probability.

The regression weights themselves (equivalent to 'beta coefficients' in your link) are computed by the `SparseLinearRegression.params_to_weights` method here: https://github.com/tensorflow/probability/blob/v0.9.0/tensorflow_probability/python/sts/regression.py#L475
You could think of the 'weights_noncentered' parameter as a normalized and likely non-sparse version of the weights, but you need to do the computation in that method to get the weights themselves.

A caveat to be aware of is that variational inference in general has a tendency to underestimate uncertainty; for regressions where several different sparsity patterns are possible, I'd expect that the variational posterior might only represent one (or at least, not all) of them, so I'd take any feature inclusion probabilities from VI with something of a grain of salt. HMC inference ought to do a better job of mixing between possible solutions, if tuned well (which the default `tfp.sts.run_with_hmc` method may or may not manage to do automatically :).

Dave


--
You received this message because you are subscribed to the Google Groups "TensorFlow Probability" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tfprobabilit...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/tfprobability/9734bd81-ac22-4d24-bedd-1c9c6d75b7ab%40tensorflow.org.

Matthew Parker

unread,
Apr 13, 2020, 5:30:48 PM4/13/20
to TensorFlow Probability, matthew...@gmail.com
Hi Dave,

Thanks so much for your help! I've got one more clarification question for you if you're up for it.

I incorporated your block of code and also the `.params_to_weights` method, but that's generating weights and feature inclusion probabilities that are normally distributed. Unless I am mistaken (a distinct possibility), since I set `tfp.sts.SparseLinearRegression(...,weights_prior_scale = 0.1)` shouldn't this output weights that are mostly (90%) close to zero and only a few (10%) that are significantly non-zero?

I understand why the mean of the weights for each feature should be normally distributed across all samples (i.e. due to Central Limit Theorem), but why would the weights across all features (600 features in my case) be normally distributed instead of displaying a horseshoe distribution?

Thanks in advance,
Matt 
 
To unsubscribe from this group and stop receiving emails from it, send an email to tfprob...@tensorflow.org.
Reply all
Reply to author
Forward
0 new messages