data {
vector<lower=0>[N] C;
}
parameters {
vector<lower=0>[N] R;
}
transformed parameters {
vector<lower=0>[N] S;
S <- R - C;
}
. . .
data {
vector<lower=0,upper=1>[N] h; // harvest rate estimated from observed data
}
parameters {
vector<lower=0>[N] R;
}
transformed parameters {
vector<lower=0>[N] S;
S <- R*(1 - h); // ensures S > 0, but not a satisfactory solution because in reality C, not h, is known
}
. . .
So at this point I have a few specific/algorithmic questions and a more general one:
(1) I'm trying to understand the relationship between constraint violations and divergent transitions. Partly this reflects my fuzziness on what exactly causes divergences at a computational level, but suppose that somewhere along the NUTS tree, the integrator encounters a divergence. Which point gets retained for that iteration of the HMC chain? The last point (i.e., the chain doesn't move)? Or a slice sample from the points along the path before the divergence occurred? Or...? Ditto for the case when one of the points on the tree violates parameter constraints -- which point gets retained then?
(2) Does a constraint violation necessarily (or usually) produce a divergence? If I know the divergences are just telling me about the infeasible holes in parameter space, should I feel reassured about treating them as innocuous?
(3) Do you have any general guidance or best practices for situations like this, where reparameterizing to a dense set is either analytically cumbersome or computationally intractable?
(2) Does a constraint violation necessarily (or usually) produce a divergence? If I know the divergences are just telling me about the infeasible holes in parameter space, should I feel reassured about treating them as innocuous?
No. A divergence happens while the constrains are being satisfied, albeit with a transformation from the constrained space to an unconstrained space.
(3) Do you have any general guidance or best practices for situations like this, where reparameterizing to a dense set is either analytically cumbersome or computationally intractable?
You pretty much have to write a cumbersome Stan program. Can you make h be a vector of primitive parameters between 0 and 1 and define R as a transformed parameter that is equal to C ./ h?
C ~ lognormal(log(R .* h), 0.001)
Hi Ben,
Thanks for the clarifications.
On Monday, June 13, 2016 at 1:11:19 PM UTC-7, Ben Goodrich wrote:(2) Does a constraint violation necessarily (or usually) produce a divergence? If I know the divergences are just telling me about the infeasible holes in parameter space, should I feel reassured about treating them as innocuous?
No. A divergence happens while the constrains are being satisfied, albeit with a transformation from the constrained space to an unconstrained space.
Interesting. This leaves me wondering why the divergences disappeared in my second cartoon example. I realize the posterior geometry in the two versions will differ in ways apart from the infeasible "holes", but I wouldn't have thought it would differ so much. Could it be that part of the problem with the "holes" is that they hinder adaptation and lead to a sub-optimal step size and mass matrix, and that leads to lots of divergences?
Yeah, I should have said "more cumbersome". This is a clever idea, though, and it might work. Things are more complicated than in my toy example because R and S are actually year x age arrays, but I think the same logic applies. I've actually spent the last few days trying something similar but not quite the same: declare both R and h as primitive parameters and estimate h by matching C (which in this case is essentially known without error) as closely as possible. To do this, you have to define a likelihood with a very high precision, say
C ~ lognormal(log(R .* h), 0.001)
This works and gives sensible answers, but it is excruciatingly slow (compared to the original version) and still has some divergences. I'll think about your idea of deterministically solving for R and see if I can get it to work.
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I do effectively know C with certainty in this case, although that may not be true in more general situations. I can't see a way to incorporate that knowledge into bounds on declared parameters, however, because (1) while I've been trying to spare everyone the tedious arithmetic, what I've been calling R is actually a function of two lower-level parameters (let's call them R_tot, a vector, and p_age, an array of simplexes), so the constraint amounts to the solution set of a system of linear inequalities which make my head hurt, (2) even if I could work out the solution, it would give me constraints on one primitive parameter that depend on another primitive parameter, and (3) even in the trivial case of my toy example, you would have different lower bounds on different elements of R, which Stan doesn't support.
In the more general case, you're right that it would make sense to assign C_meas a likelihood component and estimate its observation error variance instead of fixing it like I described above. I haven't tried that, mainly because then the model wouldn't reflect what I know about the problem.
To get around (2) and (3), it's possible to declare a vector
of unconstrained parameters and do the transforms yourself
to a constrained vector. These constraints can depend on
each other (as in a simplex) as long as you can compute the
transform and its Jacobian. That can be a pain, though, even
when it is possible.
If you're dealing with an underlying simplex,
you might want to check if some of the components are collapsing
to zero and you definitely should check that the result is
identified; if there are non-identifiabilities, Stan can try
really hard to explore a very large space of possibilities,
sometimes triggering numerical issues.
It's possible I'm being dense, in which case thanks for bearing with me. But the idea of posing a generative model makes perfect sense to me iff C is measured with error. In that case, I could assign it a stochastic observation model and estimate any associated parameters (e.g., observation error variance). In my case, C is known so precisely (in an absolute sense and relative to other observed quantities) that it doesn't really make sense to treat it as uncertain. Given that, I can only think of the following alternatives to the naive subtraction:
(1) Parameterize with S (defined as R - C, which must be positive) and solve deterministically for R as R <- S + C. I tried this, so far without success. Again, I'm glossing over a lot of complexity in this cartoon version.
(2) Parameterize with h (defined as C ./ R, bounded on [0,1]) and solve deterministically for R as R <- C ./ h and for S as S <- R .* (1 - h). This was Ben's suggestion, and I'm still wrapping my head around it, but it seems promising.
(3) Adopt the generative approach, but only as a kludge to force "C_hat" to be very close to C, essentially putting a point mass prior on the observation error variance. I've been playing around with this, but Ben pointed out it wasn't a great idea.
It's possible I'm being dense, in which case thanks for bearing with me. But the idea of posing a generative model makes perfect sense to me iff C is measured with error. In that case, I could assign it a stochastic observation model and estimate any associated parameters (e.g., observation error variance). In my case, C is known so precisely (in an absolute sense and relative to other observed quantities) that it doesn't really make sense to treat it as uncertain. Given that, I can only think of the following alternatives to the naive subtraction:You say this, but then why were you having constraint problemsin the beginning?
I agree that if C is known exactly, or close enough to exactly, then itwouldn’t have to be modeled. And in that case you could write a modelfor the counts above the baseline, S = R - C, knowing that S is alwayspositive. I am being insistent on a generative model only because youhad originally said that you were having problems keeping S positive!If I am incorrect then please correct me.
I see this all of the time in physics where background subtraction is thenorm, and the only way to do model everything correctly is to actuallymodel the background subtraction process explicitly.
(1) Parameterize with S (defined as R - C, which must be positive) and solve deterministically for R as R <- S + C. I tried this, so far without success. Again, I'm glossing over a lot of complexity in this cartoon version.
(2) Parameterize with h (defined as C ./ R, bounded on [0,1]) and solve deterministically for R as R <- C ./ h and for S as S <- R .* (1 - h). This was Ben's suggestion, and I'm still wrapping my head around it, but it seems promising.
(3) Adopt the generative approach, but only as a kludge to force "C_hat" to be very close to C, essentially putting a point mass prior on the observation error variance. I've been playing around with this, but Ben pointed out it wasn't a great idea.Actually these are all generative models, just with different assumptionsabout the measurement process.
1) There’s a difference between expected background and actual background —if if you know the expected background exactly you can get substantial variationfrom measurement to measurement. In other words, there’s a huge differencebetweenR_meas ~ Poisson(C + S)andC_meas ~ Poisson(C)S_meas ~ Poisson(S)R_meas = C_meas + S_meas.Conflicts such as R_meas - C > S_meas almost always come form assumingthe former when it’s really the latter that is appropriate.