> On Jul 31, 2016, at 11:56 AM, Jesse Wolfhagen <
jl.wol...@gmail.com> wrote:
>
> Hello,
>
> Thank you for responding so quickly!
>
> I understand the issue about defining AgeClass[i], but I guess I'm stumped as to how to address it directly. I am trying to fit an ordered logistic regression to a series of mandibles that can be assigned to ranked age stages (1-9). This works fine when all of the mandibles are assigned to a specific stage, but in reality several are broken and so cannot be assigned to one stage exclusively (for example, a mandible could be in either stage 1 or 2, thus given a probability of 0.5 for each stage and 0 for the others). I was trying to find a way to incorporate that uncertainty into the process of estimating the intercepts (which currently would give an idea of uncertainty due to sample size), rather than having to assign stages and run the model, then repeat to get an idea of how much of it changes based on the individual assignments.
A typical way to do that would be to average over both outcomes with
the probability as weight. So if you have foo_lpmf(1 | theta) and foo_lpmf(2 | theta),
and they're weighted 50% each, then you want
target += 0.5 * foo_lpmf(1 | theta) + 0.5 * foo_lpmf(2 | theta);
> Ideally, I'd like to make AgeClass a parameter (as it gets estimated via the categorical distribution call), but I cannot make an int parameter.
It's better to do the above for sampling efficiency. It just
marginalizes out what would otherwise be the integer parameter.
> I've tried using it as a local variable within the model block, which is what creates the error about it not being specified, even when I include initial values. Placing it in the data or transformed data blocks requires initial values, which fixes those values for the model rather than allowing the observed age state to vary.
>
> Maybe I'm still just confusing the issue of the sampling statement vs. "values being drawn FROM a distribution" (the random number generators): there doesn't appear to be a way to draw from a distribution before the modeling step (that is, all transformations are deterministic), is that correct?
>
> Is this something that requires a Gibbs sampler and so can't be done in Stan?
Gibbs is only one approach to discrete sampling, but the answer
is "no", you can just marginalize it out like any other variable.
> Here is my updated model (still has the same issue/error as previously, but this time SHOULD be less ... confused, I hope):
>
> data{
> int<lower=1> N; //# of mandibles
> simplex[9] ageprobs[N]; //probability of a mandible being in each age stage
> }
> transformed data{
> vector[N] phi;
> phi = rep_vector(0, N);
> }
> parameters{
> ordered[8] agebreaks;
> }
> model{
> int AgeClass[N];
> agebreaks ~ normal( 0 , 10 );
> for ( i in 1:N )
> AgeClass[i] ~ categorical(ageprobs[i]);
> for ( i in 1:N )
> AgeClass[i] ~ ordered_logistic( phi[i] , agebreaks );
> }
You almost never want the same variable, e.g., AgeClass[i] on
the left-hand side of ~ more than once. At least not for
a typical generative Bayesian model.
I'm not sure what you're trying to infer here. The cutpoints
for agebreaks aren't doing anything for you without a real predictor
for phi. It essentially just makes the predictors uniform and
the agebreaks variable won't be identified.
- Bob