Hi Matthew, that's a great question. The STS sparse regression uses a slightly different formulation from BSTS: a horseshoe prior [1] rather than a spike-and-slab prior. For any sampled value of the model parameters, each feature is included with some fraction that the horseshoe prior encourages to be *close* to either zero or one, though rarely exactly either of those values. These fractions are related to the 'shrinkage coefficients' κ_i described in sec. 2.1 of [1]. (specifically, they are 1 - κ_i). In terms of STS parameters 'local_scales_noncentered' and 'local_scale_variances', you would compute
```python
def shrinkage_coefficients(local_scales_noncentered, local_scale_variances):
local_scales = local_scales_noncentered * tf.sqrt(local_scale_variances) # λ in [1].
return 1. / (1. + local_scales ** 2)
feature_inclusion_probabilities = tf.reduce_mean(1. - shrinkage_coefficients, axis=0)
```
where the last line, taking the average fraction of inclusion over the posterior parameter samples, should compute a rough equivalent to an inclusion probability.
You could think of the 'weights_noncentered' parameter as a normalized and likely non-sparse version of the weights, but you need to do the computation in that method to get the weights themselves.
A caveat to be aware of is that variational inference in general has a tendency to underestimate uncertainty; for regressions where several different sparsity patterns are possible, I'd expect that the variational posterior might only represent one (or at least, not all) of them, so I'd take any feature inclusion probabilities from VI with something of a grain of salt. HMC inference ought to do a better job of mixing between possible solutions, if tuned well (which the default `tfp.sts.run_with_hmc` method may or may not manage to do automatically :).
Dave