Hi everyone,
I’m here to ask for some guidance with design of a model that detects latent temporal change points in regression parameters. I’ve been dabbling with RStan for a few months and am really impressed with the documentation and support, but as a novice Bayesian I’ve run into a sticking point that, with any luck, will be trivial to someone more experienced.
tl;dr - looking for Stan approach to infer step-change of regression parameters before and after a latent temporal change point.
The simple case I’ll use for illustration is univariate linear regression of response variable y as a function of predictor x, which is a random variable drawn once per time step from a normal distribution N(.) . There is also a Gaussian noise term in the response, with standard deviation sig_y.
y = a + b*x + noise —> y ~ N(a + b*x, sig_y)
x ~ N(.)
Could hardly be simpler. In the system under study, there is reason to believe that the system undergoes a step change in functional response, and thus regression parameters, at some change point time Tcp, such that:
y ~ N(a[1] + b[1]*x, sig_y) for t <= Tcp
y ~ N(a[2] + b[2]*x, sig_y) for t > Tcp
My objective is to infer Tcp, a[1:2], and b[1:2].
As posed, Tcp is discrete and thus can’t be directly represented in Stan. I implemented a hack solution in Stan that just pretends Tcp is continuous, which essentially treats any value of Tcp between time steps as equally likely. This actually works sometimes (meaning it converges and correctly estimates synthetic data parameters) but other times it fails badly (doesn’t converge, incorrect inference). I imagine this abuse of parameters has any number of undesirable consequences.
I've attached the Stan code for this hack, as well as synthetic data (n=80, true parameters: Tcp=48, a[1] = 0, b[1] = 1, a[2] = 1, b[2] = 0.5, sig_y = 0.5) and the R code for generating random data.
The manual and other discussions recommend “summing out” latent variables for mixing problems, but this isn’t really a mixing problem because the grouping is a deterministic function of Tcp, and anyway Tcp is the sought parameter. For this phase of my analysis I would be happy to estimate Tcp alone, and perhaps if I were more clever it would be obvious how to sum out the other parameters and end up with a continuous representation of the change point. But so far I’m not that clever.
Many Thanks,
-Thomas
Note:
My actual model is considerably more complicated, accounting for multiple error sources following the hierarchical approach described in
Kelly, B. C. (2007), Some Aspects of Measurement Error in Linear Regression of Astronomical Data, Astrophys. J., 665(2), 1489, doi:10.1086/519947.
and extended to a constrained segmented linear model that is applied to hydrologic data with threshold behavior. The approach works well for lumped data (no temporal change points), but the small successes I’ve had with estimating Tcp for the simplified example above entirely fail for the more complex model.
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstan_2.2.0 inline_0.3.13 Rcpp_0.11.1
loaded via a namespace (and not attached):
[1] codetools_0.2-8 KernSmooth_2.23-12 stats4_3.0.2 tools_3.0.2
Hi everyone,
I’m here to ask for some guidance with design of a model that detects latent temporal change points in regression parameters. I’ve been dabbling with RStan for a few months and am really impressed with the documentation and support, but as a novice Bayesian I’ve run into a sticking point that, with any luck, will be trivial to someone more experienced.
tl;dr - looking for Stan approach to infer step-change of regression parameters before and after a latent temporal change point.
The simple case I’ll use for illustration is univariate linear regression of response variable y as a function of predictor x, which is a random variable drawn once per time step from a normal distribution N(.) . There is also a Gaussian noise term in the response, with standard deviation sig_y.
y = a + b*x + noise —> y ~ N(a + b*x, sig_y)
x ~ N(.)
Could hardly be simpler. In the system under study, there is reason to believe that the system undergoes a step change in functional response, and thus regression parameters, at some change point time Tcp, such that:
y ~ N(a[1] + b[1]*x, sig_y) for t <= Tcp
y ~ N(a[2] + b[2]*x, sig_y) for t > Tcp
My objective is to infer Tcp, a[1:2], and b[1:2].
As posed, Tcp is discrete and thus can’t be directly represented in Stan. I implemented a hack solution in Stan that just pretends Tcp is continuous, which essentially treats any value of Tcp between time steps as equally likely. This actually works sometimes (meaning it converges and correctly estimates synthetic data parameters) but other times it fails badly (doesn’t converge, incorrect inference). I imagine this abuse of parameters has any number of undesirable consequences.
I've attached the Stan code for this hack, as well as synthetic data (n=80, true parameters: Tcp=48, a[1] = 0, b[1] = 1, a[2] = 1, b[2] = 0.5, sig_y = 0.5) and the R code for generating random data.
The manual and other discussions recommend “summing out” latent variables for mixing problems, but this isn’t really a mixing problem because the grouping is a deterministic function of Tcp, and anyway Tcp is the sought parameter. For this phase of my analysis I would be happy to estimate Tcp alone, and perhaps if I were more clever it would be obvious how to sum out the other parameters and end up with a continuous representation of the change point. But so far I’m not that clever.
--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/QlgX00f6JG8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
Hi Emmanuel, thanks for your remarks. I was perhaps not explicit enough that my change point Tcp is for the latent variable t (time), not x. That is, the functional response y = f(x) changes at some time t = Tcp.
Since the potential energy is negative log density, it's actually
falling "down" toward the mode, not climbing "up" a hill.
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
So my question is how do we deal with this? In IRT models,
we usually just constrain delta > 0, which gets rid of
the multimodality. Is that always the right thing to do?
What's the equivalent for a mixture model?
So maybe lasso is not really the best word here. Inverse lasso? In my previous code, the tuning parameter lambda is not specified and there assumed to be one (this goes beyond your question).
Apologies for the wild thoughts.
Luc