You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/y6PPWzu7xfw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
OR, and this is an OR, not an AND, if you have multiple outcomes
observed per combination of predictors, you can instead define y as
int y[C,K];
where K is the possible number out outcomes. Then you just need:
for (c in 1:C) {
real theta;
theta <- ...
y[c] ~ categorical_logit(theta);
}
--
When I run the model and re-generate the number of counts per outcome in the generated quantities block, it matches very closely what I am actually observing, so it seems like it is working. The coefficient estimate are, however, still very odd. In many cases the 95% HDI range from -50 +- 10 to 50 +- 10. I am not quite sure why. It seems like the model has converged (all Rhat < 1.1), though some variables have a low number of effective samples (<20). But even for variables with a large number of effective samples (>150) and Rhat < 1.01, the estimates are very broad. I tried putting uniform hyper priors on the priors of the coefficients, but even that did not help much. I am, however, wondering if the uniform is appropriate for that, or if there are other go-to hyper priors. Any advice on this one?
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/y6PPWzu7xfw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
When I run the model, it seems to run quite fast in the start (~50 samples per day). However, when it reaches about ~200 samples, it slows down dramatically, and only draws about 1 sample per day. I’m wondering if I’ve unintentionally come up with a problematic posterior.
I fixed one of the classes, so that K[6] = 0, which should fix the identifiability problem with softmax. I have tried to set wide priors, to give Stan some room to explore. As I have understood it, multinomial regression coefficients can be interpreted as a relative risk. The priors I have specified now should give each coefficient a value -+3, which would give risks in the interval exp(+-3) = [0.05, 20], which are perfectly reasonable relative risk values.
I do have a rather large number of data points, 7,493,720 to be exact. They are, however, split over 187,343 combinations. This is because there are 40 combinations of discrete covariates per data point ( 2 sex and 10 age). There are 6 different outcomes.Could there be any other problems, or bad specifications?
I have uploaded the model to https://gist.github.com/demodw/84e46c6bac8eb1e55d99. Any pointers on how to improve the efficiency would also be much appreciated! I could not find a multi_logit function, and softmax only takes vectors, so I had to do the log likelihood increment per data point.Also, when I am running diagnostics, should I look at Rhat values for the raw (K-1) coefficients, or for the K coefficients?
And can I somehow exclude the (K-1) coefficients from the output?
On 08 Feb 2016, at 16:11, Daniel Lee <bea...@alum.mit.edu> wrote:Hi David,On Mon, Feb 8, 2016 at 5:37 AM, David Westergaard <da...@harsk.dk> wrote:When I run the model, it seems to run quite fast in the start (~50 samples per day). However, when it reaches about ~200 samples, it slows down dramatically, and only draws about 1 sample per day. I’m wondering if I’ve unintentionally come up with a problematic posterior.Are you using the default algorithm? (NUTS with a diagonal mass matrix)How many warmup iterations? What you may be seeing is a slow down in warmup with a change in adaptation. After warmup, you typically won't see drastic differences between samples unless you run into problems in the posterior distribution that weren't discovered in warmup.
I fixed one of the classes, so that K[6] = 0, which should fix the identifiability problem with softmax. I have tried to set wide priors, to give Stan some room to explore. As I have understood it, multinomial regression coefficients can be interpreted as a relative risk. The priors I have specified now should give each coefficient a value -+3, which would give risks in the interval exp(+-3) = [0.05, 20], which are perfectly reasonable relative risk values.I do have a rather large number of data points, 7,493,720 to be exact. They are, however, split over 187,343 combinations. This is because there are 40 combinations of discrete covariates per data point ( 2 sex and 10 age). There are 6 different outcomes.Could there be any other problems, or bad specifications?Try a second level of non-centered parameterization for the hyperparameters.
I have uploaded the model to https://gist.github.com/demodw/84e46c6bac8eb1e55d99. Any pointers on how to improve the efficiency would also be much appreciated! I could not find a multi_logit function, and softmax only takes vectors, so I had to do the log likelihood increment per data point.Also, when I am running diagnostics, should I look at Rhat values for the raw (K-1) coefficients, or for the K coefficients?Just look at the K-1 coefficients. If you know the 0th one is always 0, then Rhat will be NaN (or some other non-nonsensical number).And can I somehow exclude the (K-1) coefficients from the output?Depends on the interface.
Hi Daniel,Thanks for the reply! I have some additional questions inline.On 08 Feb 2016, at 16:11, Daniel Lee <bea...@alum.mit.edu> wrote:Hi David,On Mon, Feb 8, 2016 at 5:37 AM, David Westergaard <da...@harsk.dk> wrote:When I run the model, it seems to run quite fast in the start (~50 samples per day). However, when it reaches about ~200 samples, it slows down dramatically, and only draws about 1 sample per day. I’m wondering if I’ve unintentionally come up with a problematic posterior.Are you using the default algorithm? (NUTS with a diagonal mass matrix)How many warmup iterations? What you may be seeing is a slow down in warmup with a change in adaptation. After warmup, you typically won't see drastic differences between samples unless you run into problems in the posterior distribution that weren't discovered in warmup.I’m doing 400 warmup and 400 samples, so it was only about halfway through the warm-up. Using CmdStan with default parameters for sampling.
I fixed one of the classes, so that K[6] = 0, which should fix the identifiability problem with softmax. I have tried to set wide priors, to give Stan some room to explore. As I have understood it, multinomial regression coefficients can be interpreted as a relative risk. The priors I have specified now should give each coefficient a value -+3, which would give risks in the interval exp(+-3) = [0.05, 20], which are perfectly reasonable relative risk values.I do have a rather large number of data points, 7,493,720 to be exact. They are, however, split over 187,343 combinations. This is because there are 40 combinations of discrete covariates per data point ( 2 sex and 10 age). There are 6 different outcomes.Could there be any other problems, or bad specifications?Try a second level of non-centered parameterization for the hyperparameters.How so? parametrize hyper_* so what they are also pulled from a normal(0,1) distribution? Would I just multiply the hyper_hyper value on afterwards?
I have uploaded the model to https://gist.github.com/demodw/84e46c6bac8eb1e55d99. Any pointers on how to improve the efficiency would also be much appreciated! I could not find a multi_logit function, and softmax only takes vectors, so I had to do the log likelihood increment per data point.Also, when I am running diagnostics, should I look at Rhat values for the raw (K-1) coefficients, or for the K coefficients?Just look at the K-1 coefficients. If you know the 0th one is always 0, then Rhat will be NaN (or some other non-nonsensical number).And can I somehow exclude the (K-1) coefficients from the output?Depends on the interface.I’m using CmdStan2.9 and I can’t see any way of specifying which parameters to save. Moving the *_raw parameters to the model block does not seem to be a possibility, since I am using them for assignment in the transformed parameters block.
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<multinomial_model.stan>
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<log_posterior.png>
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<stan_diag.png>
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<treedepth.png>
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<pairs.png>
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<mixed_traces.png>
--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/y6PPWzu7xfw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
OK, a bit more of experimenting, and I noticed that there is (apparently) a lot of correlation between some variables. I removed the Age^2 term since it was fully correlated with the Age term.
In the simple model, with no shared parameters, the pairs plot show that there is a near perfect correlation between the coefficients for intercept (beta_0) and age (beta_A) (fig1.png). Age is a continous variable that is taken as the mean value of the persons in the particular group, scaled to have mean=0 and sd=1. I tried to switch to another configuration, where I had a age parameter for each age interval. This had no real effect, and lead to a lot of iterations hitting tree depth. After reading some users posts on the forum, I tried switching to a formulation where I had a multi_normal prior on the coefficients, but this did not really help (fig3.png). I’ve also tried adding priors to the coefficient scales, but this did not really help, and the Neff for a lot of the scale priors were quite low (0.3-0.4). Switching to a non-centered parameterisation did not help either.
On a small subset of the data the models are converging very fast (within 50 warmup iterations), but on larger sets it hits tree depth a lot during warmup, and is taking very small step sizes.
On 29 Mar 2016, at 17:21, Krzysztof Sakrejda <krzysztof...@gmail.com> wrote:
On Tuesday, March 29, 2016 at 7:58:02 AM UTC-4, David Westergaard wrote:OK, a bit more of experimenting, and I noticed that there is (apparently) a lot of correlation between some variables. I removed the Age^2 term since it was fully correlated with the Age term.
In the simple model, with no shared parameters, the pairs plot show that there is a near perfect correlation between the coefficients for intercept (beta_0) and age (beta_A) (fig1.png). Age is a continous variable that is taken as the mean value of the persons in the particular group, scaled to have mean=0 and sd=1. I tried to switch to another configuration, where I had a age parameter for each age interval. This had no real effect, and lead to a lot of iterations hitting tree depth. After reading some users posts on the forum, I tried switching to a formulation where I had a multi_normal prior on the coefficients, but this did not really help (fig3.png). I’ve also tried adding priors to the coefficient scales, but this did not really help, and the Neff for a lot of the scale priors were quite low (0.3-0.4). Switching to a non-centered parameterisation did not help either.
On a small subset of the data the models are converging very fast (within 50 warmup iterations), but on larger sets it hits tree depth a lot during warmup, and is taking very small step sizes.Is this true of all small subsets or only a select subset or only some small subsets? Breaking up the data into a bunch of small subsets and running them all should identify a problematic subset. I imagine there's a batch of data that conflicts with the priors (or with all the rest of the data). If the the munging isn't too heavy it would be worth it to go this way rather than try to find a solution that works with conflicting/difficult data.
...Also, what does a N(-6,3) prior look like when you combine it with the rest, plug it through a softmax and the multinomial? I have some intuition about that but I would simulate from that mess before I spent a lot of time on it.
Is category 1 the non-disease/most-common category?
If I discard age and sex covariates, 5000 combinations takes about 1 hour to run. However, if I include covariates, 100 combinations takes overnight to finish (6-8 hours).
So, to sum up the questions,
1. Does this correlation between input variables affect the sampler? Is there an identifiability issue? And if so, how can I improve?
2. Is there a clever way of including the Age^2 term and breaking the correlation to age?
If it helps, I can provide a synthetic data set. Also, the models are right now written for clarity and not computational efficiency. :-)
Best,
David
On 29 Mar 2016, at 17:21, Krzysztof Sakrejda <krzysztof...@gmail.com> wrote:
On Tuesday, March 29, 2016 at 7:58:02 AM UTC-4, David Westergaard wrote:OK, a bit more of experimenting, and I noticed that there is (apparently) a lot of correlation between some variables. I removed the Age^2 term since it was fully correlated with the Age term.
In the simple model, with no shared parameters, the pairs plot show that there is a near perfect correlation between the coefficients for intercept (beta_0) and age (beta_A) (fig1.png). Age is a continous variable that is taken as the mean value of the persons in the particular group, scaled to have mean=0 and sd=1. I tried to switch to another configuration, where I had a age parameter for each age interval. This had no real effect, and lead to a lot of iterations hitting tree depth. After reading some users posts on the forum, I tried switching to a formulation where I had a multi_normal prior on the coefficients, but this did not really help (fig3.png). I’ve also tried adding priors to the coefficient scales, but this did not really help, and the Neff for a lot of the scale priors were quite low (0.3-0.4). Switching to a non-centered parameterisation did not help either.
On a small subset of the data the models are converging very fast (within 50 warmup iterations), but on larger sets it hits tree depth a lot during warmup, and is taking very small step sizes.Is this true of all small subsets or only a select subset or only some small subsets? Breaking up the data into a bunch of small subsets and running them all should identify a problematic subset. I imagine there's a batch of data that conflicts with the priors (or with all the rest of the data). If the the munging isn't too heavy it would be worth it to go this way rather than try to find a solution that works with conflicting/difficult data.So far I’ve tried with ~15 different subsets, each consisting of 10 combination (Di = 10). I had some problems prior to adding a pseudo count of +1 to all observations, since there was quite a few groups with 0 individuals. I’m not really sure how I would identify the batch of data conflicting with priors, since all traces and pairs plots look OK. Some of the parameters takes the value of the prior due to a low amount of data, but that should be OK....Also, what does a N(-6,3) prior look like when you combine it with the rest, plug it through a softmax and the multinomial? I have some intuition about that but I would simulate from that mess before I spent a lot of time on it.I set the center at -6 since this correspond to a probability of 0.25% of having any disease (irrespective of gender and age), which seems to fit the data when I just calculate #people with disease A / #people without disease A for all diseases.Some simple simulation in R,softmax <- function(row) {# Softmax row to be between 0 and 1.exp_row <- exp(row)return( exp_row / sum(exp_row) )}P <- softmax(c(0, rnorm(1, -6, 3), rnorm(1, -6, 3), rnorm(1, -11, 3), rnorm(1, -11, 3), rnorm(1, -11, 3)))rmultinom(1, size=5.5*10^6, P)Indicates that the low probability could be a problem if there was a low number of data points, but the number of individuals (~5.5 millions) is quite large, so I’m not sure if there is a problem.
f <- function(N) softmax(cbind(0, rnorm(N, -6, 3), rnorm(N, -6, 3), rnorm(N, -11, 3), rnorm(N, -11, 3), rnorm(N, -11, 3)))
tp <- f(1000)
colnames(tp) <- letters[1:6]
tpm <- reshape2::melt(tp)
ggplot(data=tpm, aes(x=value)) + geom_histogram() + facet_wrap( ~ Var2) + scale_y_log10()
Is category 1 the non-disease/most-common category?Yes. I’m working on data from a full population, so there’s about ~5 million people (divided over the different groups) in category 1 all the time, depending on how rare a condition is.Could the large number of individuals in the reference group be a problem? And if so, is there a way to control for this?David
If I discard age and sex covariates, 5000 combinations takes about 1 hour to run. However, if I include covariates, 100 combinations takes overnight to finish (6-8 hours).
So, to sum up the questions,
1. Does this correlation between input variables affect the sampler? Is there an identifiability issue? And if so, how can I improve?
2. Is there a clever way of including the Age^2 term and breaking the correlation to age?
If it helps, I can provide a synthetic data set. Also, the models are right now written for clarity and not computational efficiency. :-)
Best,
David
--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/y6PPWzu7xfw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.
To fix the problem you would want to fix the prior so that the sampler doesn't have to go so far along a flat surface towards zero. Naively (on my end, I just have experience from log/logit transforms) the standard deviation of three is too big for the low end of the softmax transform at a mean of -6.
I feel like a broken record on this particular issue with priors b/c it usually comes up with logistic regressions so it's really common but this is a good softmax example of it.Is category 1 the non-disease/most-common category?Yes. I’m working on data from a full population, so there’s about ~5 million people (divided over the different groups) in category 1 all the time, depending on how rare a condition is.Could the large number of individuals in the reference group be a problem? And if so, is there a way to control for this?DavidIf I discard age and sex covariates, 5000 combinations takes about 1 hour to run. However, if I include covariates, 100 combinations takes overnight to finish (6-8 hours).
So, to sum up the questions,
1. Does this correlation between input variables affect the sampler? Is there an identifiability issue? And if so, how can I improve?
2. Is there a clever way of including the Age^2 term and breaking the correlation to age?
If it helps, I can provide a synthetic data set. Also, the models are right now written for clarity and not computational efficiency. :-)
Best,
David
--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/y6PPWzu7xfw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/y6PPWzu7xfw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
To fix the problem you would want to fix the prior so that the sampler doesn't have to go so far along a flat surface towards zero. Naively (on my end, I just have experience from log/logit transforms) the standard deviation of three is too big for the low end of the softmax transform at a mean of -6.Right. I guess my approach at just entering the base probability at the point estimate was too naive. Now the question is, how do I incorporate that I actually know this information? From the point estimate I know that,Class 2 and 3 should have an intercept in the range -11 to -1Class 4, 5, and 6 should have an intercept in the range -12 to -4.My first intuition would be to use student_t with a low degree of freedom, e,g 3., to support this wide range. For instance,student_t(3, 6, 1) has 99% of its mass in the -10 to -1 rangestudent_t(3, 8, 1) has 99% of its mass in the -12 to -3 range
On 30 Mar 2016, at 00:32, Bob Carpenter <ca...@alias-i.com> wrote:If variables are getting pushed up against a lower bound of
zero, it may just be where the probability mass is. In such cases,
it might make sense to manually replace them with zero
(and if they
are hierarchical scale terms, replace their density with some kind of complete
pooling).
I have no experience with alternatives to softmax / multinomial logistic.
Are you running into the same issue with simulated data? These
problems can arise in well-formulated and well-coded models if
the model is highly misspecified for the data.
On another note, I think you are absolutely right about the problem being that many of the classes are bounding up against zero. I tried another two other kinds of softmax, called spherical and taylor softmax (http://arxiv.org/pdf/1511.05042v3.pdf), and now the chains are moving a lot faster, at a much lower tree depth, and converging. The downside is that I don’t think this kind of softmax makes much sense in a probabilistic setting, and it looses all interpretability, unless I can somehow calculate back, but I don’t think that’s possible. :-)
David
Some questions inline
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/y6PPWzu7xfw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/y6PPWzu7xfw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.