Comparison Operators in Stan do not take a vector, but only integers?

538 views
Skip to first unread message

Michael Andreae

unread,
Jul 14, 2014, 10:12:42 AM7/14/14
to stan-...@googlegroups.com
Dear Stan experts

I am trying to vectorize a non-linear Bayesian continuous(!) change point model using an indicator function for a linear spline, but it seems the "Comparison Operators" in Stan do not take a vector, (but only integers/reals):

In my model, y is the outcome, x1 are time points, tau are change points, all are vectors (continuous), beta is a scalar to be estimated:

y~ normal ( intercept...
   + ( (x1 <= tau ) * ( beta  * x1 ) ) 
                                                 ^-- here

DIAGNOSTIC(S) FROM PARSER:
binary infix operator < with functional interpretation logical_lte requires arguments or primitive type (int or real), found left type=vector, right arg type=vector; no matches for function name="logical_lte"

Is there a smart way around this, please?

Thank you very much.
Michael

Bob Carpenter

unread,
Jul 14, 2014, 1:17:46 PM7/14/14
to stan-...@googlegroups.com
If by smart way around, you mean a vectorized version of <=, then I'm
afraid the answer is no. No reason in principle not to vectorize these,
but we haven't gotten around to it.

So what you need to do is define the intercept in a loop, assign
it to an vector mu, and then call y ~ normal(mu, ...); That'll be
faster than calling y[n] ~ normal(...[n]..., ); in the loop.

- Bob

Luc Coffeng

unread,
Jul 14, 2014, 4:41:51 PM7/14/14
to stan-...@googlegroups.com
Hi Michael,

A more general point that you might find useful to consider regarding change point models in Stan. Recently, there has been a discussion on the stan user list about change point models and I think the following were the main conclusions:
  1. If the change point is a parameter to be estimated (tau in your case), the derivative of the posterior will be discontinuous wherever tau == [any value of x1 in the data]. This will break the Hamiltonian dynamics and the algorithm will degenerate to a random walk, driven by the random momentum of the Hamiltonian particle. Hamiltonian MC requires the posterior and its first derivative to be continuous everywhere to be able to simulate draws from the posterior.
  2. Prior issue can be theoretically overcome by marginalizing over all possible values of the change point, using a mixture model (see manual for example). So if the data are X = (x1, x2, x3, x4, ...), marginalize over [x1 < tau < x2], [x2 < tau < x3],[x3 < tau < x4], ... Unfortunately, as Michael Betancourt pointed out, the posterior for this model will be multimodal (unless you have relatively precise information about were the change point is), and will prove hard to sample from.
  3. Alternatively, Emmanuel Charpentier proposed a continuous approximation of the step function x1 <= tau, e.g. by means of a logistic function (which has a derivative). However, the issue here is that although the first derivative of the posterior is now always continuous, its curvature will still be relatively extreme wherever tau == [any value of x1 in the data]. This may cause HMC to only sample a region most proximal to a mode of the posterior, and not explore the whole posterior appropriate (i.e. the Hamiltonian particle would need a lot of random momentum to be able to climb the steep curvature around tau == [any value of x1 in the data]).

I'm sure that all the people involved in the aforementioned discussion would be very interested to hear about your experiences with your change point model!

Best,

Luc

Bob Carpenter

unread,
Jul 14, 2014, 4:56:48 PM7/14/14
to stan-...@googlegroups.com
Presumably there won't be too many modes, so this would probably be
a good candidate for some samplers that could handle limited
amounts of multimodality (compared, to say, something that can deal
with clustering models like k-means or LDA, which as far as I know,
don't exist).

The marginalization is just over one value --- Luc's enumerating
the terms that will go into it.

If someone has an example of such a model that's working, I'd love
to include it in the manual (with attribution, of course). There's
still time for the 2.4 release, though we do hope to get it out this
week.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Luc Coffeng

unread,
Jul 15, 2014, 5:05:21 AM7/15/14
to stan-...@googlegroups.com
I cooked up some code for a toy example.

Because Michael said he was using a linear spline, I assumed that the two linear splines should intersect exactly over the change point (i.e. the function is continous at the change point, although the first derivative is not), even though the code in the Michael's error message seems to suggest otherwise. So I'm assuming you want a linear spline that is continuous over the change point. Under this assumption, for any two values of X in the data, there are infinitely many positions between these two x's where the spline knot may be. Using the mixture approach in the Stan manual, it's impossible to marginalize over all these values (at least, I'm not sure how to). So I propose an adaptation.

Specify the change point itself as a latent unknown variable cp_x with unknown mean mu_cp and standard deviation sigma_cp, and then for M mixture components (for each consecutive pair of x's in the data) specify a parameter for the position of a potential change point cp_x[m] between the consecutive two values of x, such that:

cp_x[m] ~ normal(mu_cg, sigma_cg) T[x1, x2];  // x1 and x2 are two consecutive observed values of x

Finally, weight each mixture component representing the interval (x1,x2) by the normal CDF(x2, mu_cp, sigma_cp) - CDF(x1, mu_cp, sigma_cp).

The attached r-script simulates data based on the aforementioned assumption of a continuous regression line (but discontinuous derivative) at the change point. I didn't constrain the support for the change point for each mixture component as the support varies per mixture component, and defining the support would mean creating separate parameters for each mixture component and losing the ability to loop through the components of a vector or array. Bad form! As a result, the model will not initialize if you don't supply it with initial values. In a version of stan before 2.3.0, you would probably see lots and lots of metropolis rejections being spewed out because the support is not defined for cp_x.

As it turns out, the model is currently rather ineffecient (N_eff < 30 when producing 2500 samples in about 1-4 minutes with 40 data points and 21 mixture components, depending on what mode the sampler is). The posterior for the point estimate of the changing point is rather multimodal, though I found that on several occasions, the model actually sampled from a mode very close to the parameter values that were used in the data-generating process. Still, I don't really understand why the sampling of the parameters for the regression line of each mixture component is so horrible...

-Luc
linear_splines.r

Michael Andreae

unread,
Jul 15, 2014, 4:55:43 PM7/15/14
to stan-...@googlegroups.com
Dear Luc,
thank you for your work. I just read this after I posted a preliminary model following a different route...?


Will play with your toy... 
;-)
Thank you very much...
Cheers
Michael
Reply all
Reply to author
Forward
0 new messages