Modelling Multinomial data

Niloufar

unread,

Oct 29, 2021, 10:37:01 AM10/29/21

to R-inla discussion group

Hi all,

I ran the first example given in page 6 of this vignette: (https://github.com/inbo/INLA/blob/master/vignettes/multinomial.pdf) and as it is shown in the vignette it does a good job estimating the true parameters. But when I try to validate the model by comparing its predictions against true values it gives non-sensical predictions. I want to know whether I am doing something wrong or the model is no good for predicting new values. This is how I test it:

# ============================== Testing======================================

df_test <- Data
df_test[1490:1500, 1] <- NA

# Fit the model
model = inla(formula, data = df_test, family = 'Poisson',
control.predictor = list(link = 1))

real <- Data[,1]
preds <- model$summary.fitted.values$mean

real <- Data[1490:1500,1]
preds <- preds[1490:1500]

df_plot <- data.frame(real = real, preds = preds)
df_plot %>% ggplot() + geom_point(aes(real, preds))

# ================================================================================

Could you please check and see whether the model in the vignette can do predictions properly as well? I am a bit suspicious about using the Poisson Trick to model Multinomial data in INLA.

Thanks in advance,

Niloufar

Helpdesk

unread,

Oct 30, 2021, 2:33:15 AM10/30/21

to Niloufar, R-inla discussion group

looks like you get your predictions wrong, yes. you have to do
predictions in the 'poisson' setting and then convert them into the
multinomial setting

here is a toy example

> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/9e20342a-4c2a-45cb-897a-fcd2296859c8n%40googlegroups.com
> .

--
Håvard Rue
he...@r-inla.org

runme3.R

manjagoc

unread,

Apr 14, 2022, 2:32:23 AM4/14/22

to R-inla discussion group

Hi to all,

Following the example in FAQ section in INLA website and thanks to the the advice of Francesco Serafini and his tutorial, I could estimate a multinomial model like this one (using the variable names in the INLA website example):

head(data)

y idx x1 x2 x3 jdx1 jdx2 jdx3
1 508 1 28.7 4.74 48.3 1 1 1
2 153 1 28.7 4.74 48.3 2 2 2
3 122 1 28.7 4.74 48.3 3 3 3
4 744 2 38.7 2.57 57.0 1 1 1
5 267 2 38.7 2.57 57.0 2 2 2
6 131 2 38.7 2.57 57.0 3 3 3

formula = y ~ -1 +
f(idx, initial = -10, fixed = T) +
f(jdx1, x1, fixed = T, constr = T) +

f(jdx2 x2, fixed = T, constr = T) +

f(jdx3, x3, fixed = T, constr = T) + ...

r = inla(formula,

family = "poisson",

data = data)

The issue is that if my reference category is 1, SOME of the differences of the coefficients has no sense at all and are very different from what I get if I apply independent binomial models for category 2 and 3 (using INLA) or if I run a multinomial model with other packages. With these other options I obtain an intercept for category 2 and 3 respect to the reference while in the Poisson model the intercept is a random effect. I want to get coefficients consistent with the other methods for interpretation purposes, like Havard did in the runme3.R file, but with aggregated data.

Do you know how I can do this?

I will appreciate your help.

Helpdesk

unread,

Apr 15, 2022, 4:51:21 AM4/15/22

to manjagoc, R-inla discussion group

Do you run into issues here as your covariates are not scaled ? if you
do

x1 = scale(x1)

etc, would it be better?

although one can do multinomial via poisson, it is not as easy how to
define the model itself, as this normalization over the 3 groups, make
it harder to interpret.

also one need to convert estimates into probabilities predictions for
each group, which is is not done automatically.

manjagoc

unread,

Apr 15, 2022, 6:25:48 PM4/15/22

to R-inla discussion group

Hi, thank you for the answer.

Scaling the covariates does not change the situation.

For example, in the following table I'm showing the coefficients of the complete model (8 covariates):

Bb: estimated coefficients of the binomial model using the counts of category 2 as "y" and category 1 as reference.

B1: estimated coefficients of the Poisson model for category 1.

B2: estimated coeffcients of the Poisson model for category 2.

B3: estimated coeffcients of the Poisson model for category 3.

B1.B2: difference between B2 and B1 (B2 - B1).

Bb B1 B2 B3 B1.B2
Intercept -1.94195 5.43225 5.43225 5.43225 0.00000
1 0.00015 -0.00087 0.00008 0.00079 0.00095
2 -0.02148 0.00508 -0.02062 0.01555 -0.02569
3 -0.00104 0.00561 -0.00117 -0.00444 -0.00678
4 -0.00393 0.00820 -0.00034 -0.00786 -0.00854
5 -0.00222 0.00236 -0.00016 -0.00220 -0.00252
6 0.04084 0.06314 0.05919 -0.12228 -0.00396
7 0.12816 0.04282 -0.08897 0.04610 -0.13179
8 -0.00618 -0.00034 0.00495 -0.00460 0.00529
ft 0.18417 694.29214 157.51790 109.32593 0.22688

In the Poisson model, the intercept is a random variable, so I'm showing the one for the firts observation. I'm also showing the fitted values (ft), calculated "manually", for this firts observation.

Using the Binomial model, the probability for the first observation is 0.18. Using the Poisson model, the expected value for the condition 2 is 157.52 and the real sample size is 783, so the proportion is 0.2 . The expected value for category 3 is also accurate, but is not the case for category 1, however, the predicted value acoording to the model is 516.47, which meets with the restriction, so I suppose that a correction is applied after. Then I have the probability according to B2-B1 wich is not bad either (0.23). Actually, this is a spatial model, and the resulting spatial patterns looks very similar, no matter what model I use. My problem is when I tried to explain each effect. For example, the number 7 is negative for B2.B1 (which has not sense at all) but positive for Bb. The thing is that I'm getting very similar results (fitted values) with both methods, but with very differents coefficients and intercepts. Clearly, a magnitude difference has a lot of sense (logits vs logs, counts vs propabilities), but how a difference in sign can be explained? Is there a way to fix this? Or simply the coefficients can't be interpretated properly if you use the Multinomial-Poisson transformation?

Helpdesk

unread,

Apr 16, 2022, 5:44:35 AM4/16/22

to manjagoc, R-inla discussion group

its a little hard without the actual code, but I noted you do not have
an intercept term, so if all covariates are zero, then the probabilties
are all equal. this is what you want ?

intercept=rep(1:3, n)

here I set the intercept to sum to zero, but you can also fix one of
them to zero without the constraint (although its not *strictly*
correct)

formula = y ~ -1 + f(intercept, initial=-10,fixed=T,constr=T) +

f(idx, initial = -10, fixed = T) +
f(jdx1, x1, fixed = T, constr = T) +
f(jdx2 x2, fixed = T, constr = T) +
f(jdx3, x3, fixed = T, constr = T) + ...

To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/0fb4bb2f-24ee-41f3-ad52-39ae23a7dac0n%40googlegroups.com.

manjagoc

unread,

Apr 16, 2022, 6:00:26 PM4/16/22

to R-inla discussion group

That works! Thank you very much for your help.

manjagoc

unread,

Jun 21, 2022, 7:19:04 PM6/21/22

to R-inla discussion group

Hello,

Sorry to bother you again. The last piece of code works great when I only have fixed effects, but I have noticed some inconsistencies when I add a random component, in my case, a spatial component. Is there something that I have to adjust in the code or in the data? Again, using the names in the INLA website example, I have:

head(data)

y idx x1 x2 x3 intercept jdx1 jdx2 rand
1 508 1 28.7 4.74 48.3 1 1 1 1
2 153 1 28.7 4.74 48.3 2 2 2 NA
3 122 1 28.7 4.74 48.3 3 3 3 NA
4 744 2 38.7 2.57 57.0 1 1 1 2
5 267 2 38.7 2.57 57.0 2 2 2 NA
6 131 2 38.7 2.57 57.0 3 3 3 NA

formula = y ~ -1 +
f(idx, initial = -10, fixed = T) +

f(intercept, initial= -10, fixed = T, constr = T)

f(jdx1, x1, fixed = T, constr = T) +

f(jdx2, x2, fixed = T, constr = T) +

f(rand, model = "besag", graph = adjmat)

r = inla(formula,

family = "poisson",

data = data)

The IDs in adjmat match with rand values.

Helpdesk

unread,

Jun 24, 2022, 8:25:36 AM6/24/22

to manjagoc, R-inla discussion group

it looks fine, as you only add it to one of the component. multinomial
models are a bit tricky as the interpretation is not that clear due to
the renormalization into probabilities. so this will only change the
ration of p[1] wrt p[2] and p[3], right?

you may try on a simpler example without intercept if you can make more
sense of it if this is not what you want.

Håvard Rue
he...@r-inla.org

manjagoc

unread,

Jun 24, 2022, 5:15:09 PM6/24/22

to R-inla discussion group

Thank you. I can't remove the intercepts because that will lead me to my original problem. Lets say I combine category 2 and 3, leaving just two categories. Then, I can build a binomial model and a Poisson model and their results should match, right? Like the example in the runme3.R file in this thread. This is true when I use just the covariates. If I add the spatial component they do not. If I estimate the coeffients with another package the results match the ones from the binomial model, so I think I'm doing something wrong with the Poisson model. What should I do in order to get the same results with these two models? If I can achieve this, I assume that it will work for the multinomial version too.

head(data1)

y idx x1 x2 intercept jdx1 jdx2 rand
1 508 1 28.7 4.74 1 1 1 1
2 275 1 28.7 4.74 2 2 2 NA
3 744 2 38.7 2.57 1 1 1 2
4 398 2 38.7 2.57 2 2 2 NA
5 317 3 7.76 1.96 1 1 1 3
6 115 3 7.76 1.96 2 2 2 NA

formula1 = y ~ -1 +

f(idx, initial = -10, fixed = T) +

f(intercept, initial= -10, fixed = T, constr = T) +

f(jdx1, x1, fixed = T, constr = T) +

f(jdx2, x2, fixed = T, constr = T) +

f(rand, model = "besag", graph = adjmat)

r1 = inla(formula1,

family = "poisson",

data = data1)

head(data2,3)

y N x1 x2 rand
1 275 783 28.7 4.74 1
2 398 1142 38.7 2.57 2
3 115 432 7.76 1.96 3

formula2 = y ~ 1 + x1 + x2 +

f(rand, model = "besag", graph = adjmat)

r2 <- inla(formula2,
family = 'binomial',
Ntrials = N,
data = data2)

Jonathan Judge

unread,

Jun 26, 2022, 9:47:58 AM6/26/22

to manjagoc, R-inla discussion group

I am not a spatial statistician and will defer to those who are, but the validity of the Poisson equivalence when using random effects has been questioned. See, for example, https://arxiv.org/pdf/1707.08538. Your experience of having the equivalent model work well with fixed effects only seems consistent with this point. (I am assuming the spatial component here is a type of random effect, or at least not a traditional fixed effect).

When I fit logistic multinomial models with random effects of some kind, I run a separate logistic binomial model for separate datasets that in turn each code the most numerous overall category (the "pivot") as 0 and some other category as 1, take the predictions from each model on the logit scale, and then put them back together if desired on the softmax scale. Logistic multinomial regression assumes the categories are completely independent from each other anyway, so you don't lose much by running separate models. That approach is described here and generally works well with random effects: https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_set_of_independent_binary_regressions.

I cannot vouch for its application here, but I would have more confidence in it than the Poisson equivalence.

Jonathan

To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/58048419-8f58-4539-bdc1-207597f959dan%40googlegroups.com.

manjagoc

unread,

Jun 27, 2022, 4:39:16 PM6/27/22

to R-inla discussion group

Thank you Jonathan. Yes, I already tried two separate binomial models. That was one of the methods that I use to compare results and look for inconsistencies. I'm also familiar with this article, however I think it was not peer-reviewed. In the multinomial tutorial, they got satisfactory results when a spatial random effect was included. Because of these reasons, I think that probably is just me making a silly mistake in the code, but I can't detect it. Or maybe, there is a problem with the method, as the article claims, I don't know. I think that no matter the cause, the outcome will be useful for the community, a clean example to help others to avoid my mistake or a warning about the use of the method in certain circunstances. Thank you again.

manjagoc

unread,

Jul 15, 2022, 1:24:43 AM7/15/22

to R-inla discussion group

Hi Jonathan. I have an addional question. Have you compaerd models with this baseline category approach? I mean, when you have two separate models you obtain pairs of fitting indicators like DIC or WAIC. How do you combine your results to determine which is the best model if you have several alternatives?

El domingo, 26 de junio de 2022 a las 7:47:58 UTC-6, bach...@gmail.com escribió:

Reply all

Reply to author

Forward