estimating correlated joint likelihoods

Alok Bohara

unread,

May 27, 2022, 11:06:41 AM5/27/22

to R-inla discussion group

Hi: I managed to estimate a set of joint likelihoods/models -- poisson-poisson or even gaussian-gaussian -- using iidkd, and retrieved the correlation values between the two random effects.

But, I am having trouble getting sensible results (e.g., correlation) using the mixed family of distributions like" binomial and gaussian" or even bivariate --"binomial-binomial."

Is the use of "binomial" family a problem in INLA for such correlated joint models?

P.S.: I searched the thread and found a old response from Havards' response as being No to a question about the INLA and the bivariate correlated binomial likelihoods.

Any tips?

Best

Alok

Helpdesk

unread,

May 27, 2022, 11:19:21 AM5/27/22

to Alok Bohara, R-inla discussion group

you can model correlation between the linear predictors, not the data
(that are conditionally) independent.

I guess you need to scale one of the components (like with copy), as the
linear predictors for binomial and gaussian, say, are on different scale

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/99253494-18c8-422a-b6fe-9ecda4bb0cb0n%40googlegroups.com.

--
Håvard Rue
he...@r-inla.org

Alok Bohara

unread,

May 28, 2022, 6:50:25 PM5/28/22

to R-inla discussion group

I am assuming the components you are referring to are u and v (random effects -- one for binomial part and the other for the gaussian part). This is what I did:
formCorr = Y2 ~ 0+mu.y+ cov.y.CAX + cov.y.CAXSQ +cov.y.CWE + cov.y.CCIT +
mu.z + cov.z.CAX + cov.z.CAXSQ + cov.z.CWE + cov.z.CCIT + cov.z.CKL6 +
f(u, model="iidkd", order=2, n=2*n)+f(v, copy="u")

heckman <- inla(formCorr, data=ldat2,
family=c("binomial", "gaussian"),
control.family=list(list(control.link = list(model = "probit")),
list(control.link = list(model = "identity"))) )

Alok

Helpdesk

unread,

May 29, 2022, 11:56:06 AM5/29/22

to Alok Bohara, R-inla discussion group

doing

f(v, copy="u")

will not do a scaling, which you need as binomial and Gaussian is on a
different scale, do

f(v, copy="u", hyper = list(beta = list(prior = "normal",
param = c(1,10),
fixed = FALSE)))

or with some prior, but 'fixed = FALSE' is required as 'TRUE' is the
default

H

Alok Bohara

unread,

May 29, 2022, 1:21:58 PM5/29/22

to R-inla discussion group

Havard:

Tried:

for.Heck.inla2 = Y ~ 0+mu.y+ cov.y.edu +
mu.z + cov.z.edu + cov.z.huswage +

f(u, model="iidkd", order=2, n=2*n) +

f(v, copy="u", hyper = list(beta = list(prior = "normal",
param = c(1,10),
fixed = FALSE)))

heckman.inla <- inla(for.Heck.inla2, data=ldat,

family=c("binomial", "gaussian"),

# control.inla = list(int.strategy = "eb"),

control.family=list(list(control.link = list(model = "probit")),
list(control.link = list(model = "identity"))) )

summary(heckman.inla)

Results did not change from what I got from a simple copy: f(v, copy="u"). I have posted the entire codes (without the scaling) and data in another thread to Denise. Thanks for your help..

Helpdesk

unread,

May 30, 2022, 1:38:22 AM5/30/22

to Alok Bohara, R-inla discussion group

then there must be something with your vectors 'u' and/or 'v'

Alok Bohara

unread,

May 30, 2022, 9:02:12 AM5/30/22

to R-inla discussion group

H: Here are my codes. Thanks. A

Data and Codes:

######################################

library(sampleSelection)
data("Mroz87")
names(Mroz87)
data<-Mroz87
data$lwage = log(data$wage+1)

##R's sample selection Heckman model (matches with Stata's heckman mle estimation with correlation )

## A simple example
R.HeckMle <- heckit(
selection = lfp ~ educ + huswage,
outcome = lwage ~ educ,
method = "ml",
data = data)
summary(R.HeckMle)

###############INLA ####################################
library(INLA)
n <-nrow(data)

### covariates
#outcome eq
cov.y.edu<-data$edu
#selection eq
cov.z.edu<-data$edu
cov.z.huswage <-data$huswage

# Two r.e. (for binomial and gaussian)
u <- c(1:n, rep(NA, n))
v <- c(rep(NA, n), n+(1:n))

#two likelihood data columns

data$lwageNA<-ifelse(data$wage>0,data$lwage,NA) # NA for missing wages
Y <- matrix(NA,n+n,2)
Y[1:n,1] <- data$lfp
Y[n+1:n,2] <- data$lwageNA

#constants
mu.z <- c(rep(1,n),rep(NA,n))
mu.y <- c(rep(NA,n),rep(1,n))

#data preparation
ldat = list(
Y=Y,
mu.z=mu.z,
mu.y=mu.y,
# selection
cov.z.edu= c(cov.z.edu, rep(NA,n)),
cov.z.huswage= c(cov.z.huswage, rep(NA,n)),
#outcome
cov.y.edu = c(rep(NA, n), cov.y.edu),
u=u,
v=v)

#Heckman using INLA (joint with correlated errors)
for.Heck.inla = Y ~ 0+mu.y+ cov.y.edu +

mu.z + cov.z.edu + cov.z.huswage +
f(u, model="iidkd", order=2, n=2*n) + f(v, copy="u", hyper = list(beta = list(prior = "normal", param = c(1,10),fixed = FALSE)))

heckman.inla <- inla(for.Heck.inla, data=ldat,

family=c("binomial", "gaussian"),

control.family=list(list(control.link = list(model = "probit")),
list(control.link = list(model = "identity"))) )

summary(heckman.inla)

Model <-heckman.inla

# random effects summary and correlation
MC_samples <- inla.iidkd.sample(10^4, Model, "u", return.cov=T)
VarCov <- matrix(unlist(MC_samples), nrow = 2^2)
VarCovMeans = matrix(rowMeans(VarCov),2,2);round(VarCovMeans,6) # mean of variance-covariance

rho <- VarCovMeans[1,2]/(sqrt(VarCovMeans[1,1]*sqrt(VarCovMeans[2,2]))

rho

Alok Bohara

unread,

May 30, 2022, 3:19:10 PM5/30/22

to R-inla discussion group

Further, for whatever its worth:

some properties of the Heckman selection model are as follows:

1. the inclusion of latent variable L in both the continuous-outcome (wage) equation and the censored-outcome selection equation;

2. constraining the selected <- L path coefficient to be 1;

3. constraining the variance of L to be 1; and

4. constraining the error variances to be equal.

A generalized SEM model (gsem) estimates Heckman (STATA) model (binomial/selection-gaussian) by adding the above features as follows:

** Joint model with mixed distributions in gsem

gsem (lwageCen <- educ L) /// // L = latent variable
(selected2 <- educ huswage L@1, /// // scaled?
family(gaussian, udepvar(notselected2))), ///
var(L@1 e.lwageCen@a e.selected2@a) // @a makes the two variances equal --PHI(xb/sigma) of probit and phi(xb/sigma) in continuous gaussian

where:

gen selected2 = 0 if lfp == 0
gen notselected2 = 0 if lfp == 1

The outcome equation from this model comes out fine, whereas the selection equation coefficients and correlation requires some transformation to match what I get from R's selection program. I thought I would share it to show how this gsem also adds a r.e. variable (latent) in both equations to facilitate the correlation between the two families of distribution.

Alok Bohara

unread,

Jun 2, 2022, 12:34:33 PM6/2/22

to R-inla discussion group

Havard: Any tips on my u and/or v in the codes? Thanks for your help. AB

Denis Rustand

unread,

Jun 2, 2022, 3:22:02 PM6/2/22

to R-inla discussion group

The reason for the difference between your INLA code and the Heckman model seems to be the constraint on the variance of one of the two correlated random effects. This is not directly available in INLA, we are looking into it and may develop it soon, then we can see if we match the results you provided or if something else is missing.

Alok Bohara

unread,

Jun 3, 2022, 11:40:47 AM6/3/22

to R-inla discussion group

1. Thanks Denis for your clarification. At least I now know what the issues are. BTW, the sigma and correlation from the above gsem model are retrieved as follows:

nlcom (sigma: sqrt(_b[/var(e.lwageCen)] +_b[wage:L]^2)) (rho: _b[lwageCen:L]/(sqrt((_b[/var(e.lwageCen)]+1)*(_b[/var(e.lwageCen)] + _b[lwageCen:L]^2))))

Likewise, the coefficients of the probit part is also retrieved by dividing the probit coefficients by sqrt(_b[/var(e.lwageCen)]+1))

Both sigma and rho after transformation match with the R's heckman selection estimation.

2. As for the simultaneous model we discussed earlier, GSEM also allows the simultaneous modeling (for mixed distribution) by introducing latent variables in equations and rescaling them if necessary. An example of the Instrumental variable with the binary r.h.s. variable may look something like this:

#Endogenous treatment model

y1 = a0+a1*x1 + a2*y2+u (y1 is gaussian, e.g., wage where y2 is endogenous)

y2 = b0 + b1*z1 + v (y2 is endogenous probit, e.g., union participation).

gsem (wage <- age grade i.smsa i.black tenure 1.union L)
(llunion <- i.black tenure i.south L@1,
family(gaussian, udepvar(ulunion))),
var(L@1 e.wage@a e.llunion@a)

generate llunion = 0 if union == 1 // missing otherwise

generate ulunion = 0 if union == 0 // missing otherwise

Note: unlike in Heckman, here the continuous variable wage is observed for both the treated and untreated samples (union and non-union members). The model allows correlation through the latent variables.

The output of a model like this may look like this:

Generalized structural equation model Number of obs = 1,210
Response: wage
Family: Gaussian
Link: Identity
Censoring of obs:
Lower response: llunion Uncensored = 0
Upper response: ulunion Left-censored = 957

Family: Gaussian Right-censored = 253
Link: Identity Interval-cens. = 0
Log likelihood = -3051.575
( 1) [llunion]L = 1
( 2) - [/]var(e.wage) + [/]var(e.llunion) = 0
( 3) [/]var(L) = 1

Coefficient Std. err. z P>|z| [95% conf. interval]
wage (outcome)
(coefficients, std errr, a-value etc.)
age .1487409 .0193291 7.70 0.000 .1108566 .1866252
grade .4205658 .0293577 14.33 0.000 .3630258 .4781057
1.smsa .9117044 .1249041 7.30 0.000 .6668969 1.156512
1.black -.7882472 .1367077 -5.77 0.000 -1.056189 -.520305
tenure .1524015 .0369595 4.12 0.000 .0799621 .2248408
1.union 2.945816 .2749549 10.71 0.000 2.406914 3.484718
L -1.706795 .1288024 -13.25 0.000 -1.959243 -1.454347
_cons -4.351572 .5283952 -8.24 0.000 -5.387207 -3.315936

llunion (treatment)

(coefficients, std errr, a-value etc.)
1.black .6704049 .148057 4.53 0.000 .3802185 .9605913
tenure .1282024 .0357986 3.58 0.000 .0580384 .1983664
1.south -.8542673 .136439 -6.26 0.000 -1.121683 -.5868518
L 1 (constrained)
_cons -1.302676 .1407538 -9.25 0.000 -1.578548 -1.026804

4 Example 46g — Endogenous treatmenteffects
model
var(L) 1 (constrained)
var(e.wage) 1.163821 .2433321 .7725324 1.753298
var(e.llun~n) 1.163821 .2433321 .7725324 1.753298

########################################

It would be wonderful to have these options in INLA thourgh its u and v's.

Alok Bohara

unread,

Jun 3, 2022, 12:03:26 PM6/3/22

to R-inla discussion group

I think this is the structure with latent variable added for the endogenous treatment effect model:

y1 = a0+a1*x1 + a2*y2+u +lambda*L (y1 is gaussian, e.g., wage where y2 is endogenous)

y2 = b0 + b1*z1 + v + L (y2 is endogenous probit, e.g., union participation).

where L ~ N(0,1) u ~ N(0,s) v ~ N(0,1) cor(u,v) = rho. The second equation consists of a rescaled probit model.

The Heckman model may look like this:

y1 = a0+a1*x1 + u + L (y1 is gaussian/censored/NAs , e.g., wage)

y2 = b0 + b1*z1 + v +lambda* L (y2 is e.g., union participation). he second equation consists of a rescaled probit model, with

variance 1 + lambda^2 instead of one.

Denis Rustand

unread,

Jun 5, 2022, 6:17:07 AM6/5/22

to R-inla discussion group

Hi Alok,

Please find attached a code with the new model formulation (two correlated random effects with one of the variance fixed to 1), the results are similar to what you get from the sampleSelection package and STATA's MLE fit (all the values are in the credible intervals), although not exactly matching, maybe there is still something different but I don't see what from your description, maybe the difference is only related to the estimation strategy...

For the model with a latent variable, it is possible to add an iid random effect with a fixed variance of 1, and it can be shared and scaled with a lambda parameter in the second equation, you can do it by adding the following to the formula:
f(idL.y, model="iid", hyper=list(prec=list(initial=0, fixed=T)))+
f(idL.z, copy="idL.y", fixed=F)
Where idL.y is the id for the first equation and idL.z the same id but for the second equation.

Best,
Denis

SampleSelection_INLA.R

Jonathan Judge

unread,

Jun 5, 2022, 10:15:56 AM6/5/22

to Denis Rustand, R-inla discussion group

Denis:

If there are only slight differences perhaps that could be a function of the prior(s)?

Also is what you are proposing similar to the state space model proposed here, without the weights? I notice the initial starting value is much different but otherwise it seems like the same general idea. https://becarioprecario.bitbucket.io/inla-gitbook/ch-temporal.html#sec:spacestate

Thanks,

Jonathan

Sent from my iPhone

On Jun 5, 2022, at 5:17 AM, Denis Rustand <denis....@gmail.com> wrote:

Hi Alok,

To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/70cdd697-dd64-42be-bfdc-e6d27bfb8e7cn%40googlegroups.com.

SampleSelection_INLA.R

Alok Bohara

unread,

Jun 5, 2022, 12:05:10 PM6/5/22

to R-inla discussion group

Thanks Denise,

1. I like the latent var structure approach and your suggestion to add: f(idL.y, model="iid", hyper=list(prec=list(initial=0, fixed=T)))+
f(idL.z, copy="idL.y", fixed=F). This will allow me to consider many other mixed correlated distributions (in INLA) just like the GSEM of stata. The only minor glitch is that the coefficients from the scaled distribution need to be divided by XXXX (mixed of lambda and sigma) to retrieve the original coefficients.

[For example, with your script the "slopes" of the outcome eq came out similar to what I get from R's selection results, but the coefficients of the INLA selection equation have to be divided by -- e.g., sqrt(_b[/var(e.wage)]+1)) if I am thinking it correctly or something similar depending how the L is added and the coefficients are scaled. Sigma is similar (.48) but the rho's are a bit different .58 versus .29 (inla).

I had presented two models with latent vars: 1) endogenous treatment model 2) Heckman selection. Which one does your suggestion apply to? Model 1 or Model 2? I am assuming that your "model=HeckVar" option and the script you wrote will not be needed here in this latent var approach. I will be curious to see if the latent var suggestion (for Model 2) will give the same results as the R's sample selection results.

Model 1: Endogenous treatment (notice y2 in the first eq.)

y1 = a0+a1*x1 + a2*y2+u +lambda*L (y1 is gaussian, e.g., wage where y2 is endogenous)

y2 = b0 + b1*z1 + v + L (y2 is endogenous probit, e.g., union participation).

where L ~ N(0,1) u ~ N(0,s) v ~ N(0,1) cor(u,v) = rho. The second equation consists of a rescaled probit model.

Model 2: The Heckman model with latent var approach may look like this:

y1 = a0+a1*x1 + u + L (y1 is gaussian/censored/NAs , e.g., wage)

y2 = b0 + b1*z1 + v +lambda* L (y2 is e.g., union participation). The second equation consists of a rescaled probit model, with variance 1 + lambda^2 instead of one.

where L ~ N(0,1), v ~ N(0,1) u ~ N(0,s) cor(u,v) = rho

Yet another version can be as follows:

Model 3.

y1 = a0+a1*x1 + u + L@a (i.e., lambda1*L )

y2 = b0 + b1*z1 + v + L@a (i.e., lambda1*L)

where L ~ N(0,1), cor(u,v) = rho Note: u and v can have unit variances for the bivariate probits or can be relaxed for other mixes.

Thanks for your help in addressing some of my queries about INLA. It would be wonderful if you could spell out the f(...) + (f...) functions for each of these cases. I will report back the results.

Thanks.

Alok

Denis Rustand

unread,

Jun 5, 2022, 1:14:36 PM6/5/22

to R-inla discussion group

@Jonathan
That's right, the priors could explain the difference but since they are kept as default (non informative), they shouldn't have an impact and the posterior distribution should match with the MLE.

For the state space model, I think it is a little bit different. We could have a unique error term that is shared and scaled to capture the correlation, then the model should have a similar fit. The difference is that the correlation that is of interest here would be captured by the scaling parameter.

@Alok
Attached are the codes that should correspond to the three models you described, i.e.:
Model 1: As I told you, sharing the linear predictor from one equation into the other gets a little bit tricky. In the summary of the results, the "Beta for eta1" is your "a2" and the "Beta for idL.y" is your lambda.

Model 2: Just added the L components here.

Model 3 (relaxed constraints version): I am not sure what you mean with this "L@a" in both equations, it seems that you want to scale the unit variance random variable with the same scaling parameter in both equations, then you can just remove the "unit variance" constraint and copy the same random variable without scaling parameter, it should be the same.

(note that I did not change the name of all components, don't be confused with the "Heckman" object names...)

About the required division to retrieve the original coefficients, I guess you can do it post fit? If you want to account for uncertainty, you can just sample from the posterior.

Denis

Model3_INLA.R

Model1_INLA.R

Model2_INLA.R

Alok Bohara

unread,

Jun 6, 2022, 2:13:06 PM6/6/22

to R-inla discussion group

Thanks Denis.

1.I ran all the models. For whatever its worth, I have attached the model comparison of inla with three other packages-- R, and stata's conditional mixed process and GSEM

My main questions are : what is the impact coefficient of the treatment effect in INLA (for the other packages they come out to be around .7. For some reason, R flips the sign of the treatment effect.) For INLA, is this the Beta for idL.y 0.429 ?

2. Next, I will share the comparison of Model 3. I have tried two versions-- binomial-gaussian and binomial-binomial and I get identical results from CMP and GSEM. My trouble is with the inla (I am sure its my problem).

One thing that I notice that the model where I have used L@a in both equations like below tends to restrict the correlations to be positive (model 3). So I have to be careful what my y1 and y2 are. (Lambda becomes 0 or iterates for ever.)

SUR type system:

#Model 3. (gaussian-binomial)
#y1 = a0+a1*x1 + u + L@a (i.e., lambda1*L ) # continuous
#y2 = b0 + b1*z1 + v + L@a (i.e., lambda1*L) # binomial
#where L ~ N(0,1), u ~ N(0,s^2), v ~ N(0,1); cor(u,v) = rho Note: u and v can have unit variances for the bivariate probits or can be relaxed for other mixes.

* Model 3b (gaussian-binomial) I could also do in case of a convergence problem

#y1 = a0+a1*x1 + u + L (i.e., lambda coeff is free) # continuous
#y2 = b0 + b1*z1 + v + L@1 (i.e., lambda is 1) # binomial
#where L ~ N(0,1), u ~ N(0,s^2), v ~ N(0,1); cor(u,v) = rho Note: u and v can have unit variances for the bivariate probits or can be relaxed for other mixes.

#Model 3c bivariate probit:

#y1 = a0+a1*x1 + u + L (i.e., lambda coeff is free) # binomial

#y2 = b0 + b1*z1 + v + L@1 (i.e., lambda is 1) # binomial
#where L ~ N(0,1), u ~ N(0,1), v ~ N(0,1); cor(u,v) = rho Note: u and v can have unit variances for the bivariate probits or can be relaxed for other mixes.

#Model 3d bivariate probit:

#y1 = a0+a1*x1 + u + L@a (i.e., lambda coeff rest) # binomial

#y2 = b0 + b1*z1 + v + L@a (i.e., lambda is rest) # binomial
#where L ~ N(0,1), u ~ N(0,1), v ~ N(0,1); cor(u,v) = rho Note: u and v can have unit variances for the bivariate probits or can be relaxed for other mixes.

(I have tried all of these within GSEM and got the results that match with the CMP, including the correlations.)

I don't know if I am going about the wrong way with the latent variable approach. I am trying to see how I could use your Model 3 script to accommodate all of these.

P.S.: I ran your binomial gaussian script (Model 3), but could not get the correlation. My attempt to use binomial/binomial crashed; I am sure I may some issues with the way I have set up my hyperparameters. I am ploughing through gradually and your help has been great. Thanks again.

Alok

ForInlaDisGrp-Model1-EndogenousTreatment-Comparison.txt

Denis Rustand

unread,

Jun 7, 2022, 12:08:56 PM6/7/22

to R-inla discussion group

Hi Alok,

I think we missed the "tobit" part of the model. In the previous codes, we defined the likelihood part of the observed wage (working hours > 0) but we did not account for the censored (working hours = 0). From my understanding, these censored observations should have a likelihood contribution that corresponds to the probability of observing a wage value less or equal to zero. In the attached code, I added this likelihood contribution using the inla.surv() function to account for left-censored observations.

The results seems to match the treatment effect you are looking for (0.7) but with a slightly higher variability. It is given as "Beta for eta1" and the value is 0.585 [credible interval: 0.388 ; 0.783]. Moreover, the variance of the error seems to match the one you have from the "hybrid" estimation in R (0.258 with hybrid, 0.258 with INLA). Same for the correlation (0.91 with hybrid, 0.918 with INLA).

Denis

TRT_effect_INLA.R

Reply all

Reply to author

Forward