RI CLPM model with binary endogenous variables

375 views
Skip to first unread message

Eric Shuman

unread,
Jun 14, 2021, 11:45:03 AM6/14/21
to lavaan

I'm working to conduct a random-intercepts cross-lagged model with lavaan in R. I'm using great code published by Mulder & Hamaker, 2020 (see https://jeroendmulder.github.io/RI-CLPM/), as a starting point, and in general everything works well. The problem is if I try to run the model with a binary endogenous variable, I get tons of errors (see below). This is true even if I use the "ordered" function in lavaan which is supposed to handle binary variables. There is no missing data in the dataset, so this can't be the issue. Does anyone know if there are simply additional specifications, constraints, etc., that I need? 


Here are the errors I get:

Warning messages:

1: In lav_model_estimate(lavmodel = lavmodel, lavpartable = lavpartable,  :

  lavaan WARNING: the optimizer (NLMINB) claimed the model converged,

                  but not all elements of the gradient are (near) zero;

                  the optimizer may not have found a local solution

                  use check.gradient = FALSE to skip this check.

2: In sqrt(A1[[g]]) : NaNs produced

3: In lavaan(RICLPM, data = dwr_nm, meanstructure = T, int.ov.free = T) :

  lavaan WARNING: estimation of the baseline model failed.


I wasn't able to reproduce these specific errors, with 100% reproducible code. 

But I'm attaching here reproducible code (which generates other errors), but the model setup and syntax is completely identical to mine (its just based on randomly generated variables rather than my data).

Eric Shuman

unread,
Jun 14, 2021, 11:46:06 AM6/14/21
to lavaan
For some reason, it wouldn't let me post it with the code attached so here is the code:
#Attempting with Binary Outcome - Reproducible code
library(tidyverse)
library(lavaan)

#continous Predictor Variabe
#6 time points, n=300 to match my data with some variation across time
ProtestBelong_t1 <- sample(1:7, 300, replace = T)
ProtestBelong_t2 <- sample(1:7, 300, replace = T)
ProtestBelong_t3 <- sample(1:7, 300, replace = T)
ProtestBelong_t4 <- sample(1:7, 300, replace = T)
ProtestBelong_t5 <- sample(1:7, 300, replace = T)
ProtestBelong_t6 <- sample(1:7, 300, replace = T)

#Binary Outcome Variable
#6 time points, n=300 to match my data with some variation across time
ParticipationBi_t1 <- rbinom(n=300, size=1, prob=0.6)
ParticipationBi_t2 <- rbinom(n=300, size=1, prob=0.55)
ParticipationBi_t3 <- rbinom(n=300, size=1, prob=0.5)
ParticipationBi_t4 <- rbinom(n=300, size=1, prob=0.47)
ParticipationBi_t5 <- rbinom(n=300, size=1, prob=0.45)
ParticipationBi_t6 <- rbinom(n=300, size=1, prob=0.40)

d <- list(ProtestBelong_t1, ProtestBelong_t2, ProtestBelong_t3,
                     ProtestBelong_t4, ProtestBelong_t5, ProtestBelong_t6,
                     ParticipationBi_t1, ParticipationBi_t2, ParticipationBi_t3,
                     ParticipationBi_t4, ParticipationBi_t5, ParticipationBi_t6)

d <- as.data.frame(do.call(cbind, d))
names(d) <- c("ProtestBelong_t1", "ProtestBelong_t2", "ProtestBelong_t3",
              "ProtestBelong_t4", "ProtestBelong_t5", "ProtestBelong_t6",
              "ParticipationBi_t1", "ParticipationBi_t2", "ParticipationBi_t3",
              "ParticipationBi_t4", "ParticipationBi_t5", "ParticipationBi_t6")
#clean up ws
rm(list=setdiff(ls(), "d"))


#setting binary vairable as type ordered for lavaan
d[,c("ParticipationBi_t2",
       "ParticipationBi_t3",
       "ParticipationBi_t4",
       "ParticipationBi_t5",
       "ParticipationBi_t6")] <-
  lapply(d[,c("ParticipationBi_t2",
                "ParticipationBi_t3",
                "ParticipationBi_t4",
                "ParticipationBi_t5",
                "ParticipationBi_t6")], ordered)

class(d$ParticipationBi_t2)


#Defining random intercepts cross-lagged model
RICLPM <- '
  # Create between components (random intercepts)
  RI_Participation =~ 1*ParticipationBi_t1 + 1*ParticipationBi_t2 + 
                      1*ParticipationBi_t3 + 1*ParticipationBi_t4 + 
                      1*ParticipationBi_t5 + 1*ParticipationBi_t6
                      
  RI_Belong =~ 1*ProtestBelong_t1 + 1*ProtestBelong_t2 + 
               1*ProtestBelong_t3 + 1*ProtestBelong_t4 + 
               1*ProtestBelong_t5 + 1*ProtestBelong_t6  
                      
  # Create within-person centered variables
  wParticipation1 =~ 1*ParticipationBi_t1
  wParticipation2 =~ 1*ParticipationBi_t2
  wParticipation3 =~ 1*ParticipationBi_t3 
  wParticipation4 =~ 1*ParticipationBi_t4
  wParticipation5 =~ 1*ParticipationBi_t5
  wParticipation6 =~ 1*ParticipationBi_t6
  
  wBelong1 =~ 1*ProtestBelong_t1
  wBelong2 =~ 1*ProtestBelong_t2
  wBelong3 =~ 1*ProtestBelong_t3
  wBelong4 =~ 1*ProtestBelong_t4
  wBelong5 =~ 1*ProtestBelong_t5
  wBelong6 =~ 1*ProtestBelong_t6


  # Estimate the lagged effects between the within-person centered variables.
  wParticipation2 + wBelong2 ~ wParticipation1 + wBelong1
  wParticipation3 + wBelong3 ~ wParticipation2 + wBelong2
  wParticipation4 + wBelong4 ~ wParticipation3 + wBelong3
  wParticipation5 + wBelong5 ~ wParticipation4 + wBelong4
  wParticipation6 + wBelong6 ~ wParticipation5 + wBelong5

  # Estimate the covariance between the within-person centered variables at the first wave. 
  wParticipation1 ~~ wBelong1 # Covariance
  
  # Estimate the covariances between the residuals of the within-person centered variables (the innovations).
  wParticipation2 ~~ wBelong2
  wParticipation3 ~~ wBelong3
  wParticipation4 ~~ wBelong4
  wParticipation5 ~~ wBelong5
  wParticipation6 ~~ wBelong6
  
  # Estimate the variance and covariance of the random intercepts. 
  RI_Participation ~~ RI_Participation
  RI_Belong ~~ RI_Belong
  RI_Participation ~~ RI_Belong

  # Estimate the (residual) variance of the within-person centered variables.
  wParticipation1 ~~ wParticipation1 # Variances
  wBelong1 ~~ wBelong1 
  wParticipation2 ~~ wParticipation2 # Residual variances
  wBelong2 ~~ wBelong2 
  wParticipation3 ~~ wParticipation3
  wBelong3 ~~ wBelong3 
  wParticipation4 ~~ wParticipation4
  wBelong4 ~~ wBelong4 
  wParticipation5 ~~ wParticipation5
  wBelong5 ~~ wBelong5
  wParticipation6 ~~ wParticipation6
  wBelong6 ~~ wBelong6
'
RICLPM.fit <- lavaan(RICLPM, data = d, meanstructure = T, int.ov.free = T) 
summary(RICLPM.fit, standardized = T)

Terrence Jorgensen

unread,
Jun 14, 2021, 3:39:46 PM6/14/21
to lavaan

if I try to run the model with a binary endogenous variable, I get tons of errors (see below).

You posted warnings, not error messages. 

This is true even if I use the "ordered" function in lavaan which is supposed to handle binary variables. Does anyone know if there are simply additional specifications, constraints, etc., that I need? 


The default "delta" parameterization for ordinal endogenous variables fixes their marginal variances to 1 for identification (alternatively, you could set their residual variances to 1 using parameterization="theta").  You are creating single-indicator constructs for each in order to add the AR and CL paths between adjacent timepoints, but you are only fixing the factor loadings to 1.  You also need to fix the residual variances to 0, so that all their (residual) variance is captured in their single-indicator factors.  But because the residual variance was not an estimated parameter, you need to fix the w* factor variances too.  You could fix them to 1 (consistent with theta parameterization) or impose some quite complex constraints to make their marginal variances == 1 (consistent with delta parameterization).  

Note that their intercepts are also fixed to zero for identification, so I'm not sure it makes sense to free them using the argument int.ov.free=TRUE.  In order for those parameters to be estimable with ordinal variables, you need to impose equality constraints on thresholds across time, or simply fix all the thresholds to zero so that the intercepts capture that information instead (they will be the same absolute magnitude but opposite sign).

Not sure what exactly the problem is when you pretend the binary variables are continuous, but perhaps the issue is that your data-generating code is simulating independent variables (i.e., you simulate each occasion without reference to any other occasion).  If the variables are not associated over time, then there is nothing for your AR or CL paths to capture except sampling error.  You could try simulating data from the actual model you are fitting by setting population values.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Reply all
Reply to author
Forward
0 new messages