Bias in latent mean and variance in longitudinal data

294 views
Skip to first unread message

Berend

unread,
May 22, 2021, 9:19:37 AM5/22/21
to lavaan
I have simulated (using mirt) a longitudinal dataset consisting of 2 measurements with a 10 item scale, using the same item parameters at each measurement. The latent mean at Time 1 was set at 0 and the variance at 1. The latent mean at Time 2 was set at 0.5, its variance at 2, and the correlation between Time 1 and Time 2 was set at 0.70. 
I subjected this dataset to a 2-factor analysis in lavaan, with factors T1 and T2 and the appropriate items loading on these factors. The loadings and thresholds of the same items were constrained to be equal over time. The latent mean and variance of T1 were fixed at 0 and 1. The latent mean and variance of T2 were freely estimated, as was the covariance between T1 and T2. Residual correlations between same items across time were allowed. I used the following command.
fit <- lavaan(model, data=datw, ordered=names(datw)) 
I got the following:
Warning message:
In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING:
    The variance-covariance matrix of the estimated parameters (vcov)
    does not appear to be positive definite! The smallest eigenvalue
    (= 1.988158e-22) is close to zero. This may be a symptom that the
    model is not identified.
However, the parameters and their SEs seem OK (except that there are no SEs reported for the item variances).
The latent mean of T2 is estimated to be 0.20 (instead of the simulated 0.50) and the latent variance of T2 is estimated to be 1.32 (instead of the simulated 2). 
My questions are:
- Does the warning message indicate there is something wrong with the data or the model?
- If not, what explains that lavaan does not recover the simulated latent mean and variance of T2?
Thanks, Berend



kma...@aol.com

unread,
May 22, 2021, 11:02:53 PM5/22/21
to lavaan
Did you fix the scale of T2?

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Berend

unread,
May 23, 2021, 4:01:40 AM5/23/21
to lavaan
Hi Keith,

Thanks for responding. I guess the scale of T2 was indirectly determined by the constraints on the loadings and thresholds. For your information, I paste the model syntax below.

Thanks, Berend

model <- ' 
   # Factors
   F1 =~ a1*Item_1+a2*Item_2+a3*Item_3+a4*Item_4+a5*Item_5+a6*Item_6+
         a7*Item_7+a8*Item_8+a9*Item_9+a10*Item_10
   F2 =~ a1*Item_1.1+a2*Item_2.1+a3*Item_3.1+a4*Item_4.1+a5*Item_5.1+
         a6*Item_6.1+a7*Item_7.1+a8*Item_8.1+a9*Item_9.1+a10*Item_10.1

   # Correlated errors over time
   Item_1 ~~ Item_1.1
   Item_2 ~~ Item_2.1
   Item_3 ~~ Item_3.1
   Item_4 ~~ Item_4.1
   Item_5 ~~ Item_5.1
   Item_6 ~~ Item_6.1
   Item_7 ~~ Item_7.1
   Item_8 ~~ Item_8.1
   Item_9 ~~ Item_9.1
   Item_10 ~~ Item_10.1

   # Thresholds
   Item_1 + Item_1.1 | b11*t1+b12*t2+b13*t3
   Item_2 + Item_2.1 | b21*t1+b22*t2+b23*t3
   Item_3 + Item_3.1 | b31*t1+b32*t2+b33*t3
   Item_4 + Item_4.1 | b41*t1+b42*t2+b43*t3
   Item_5 + Item_5.1 | b51*t1+b52*t2+b53*t3
   Item_6 + Item_6.1 | b61*t1+b62*t2+b63*t3
   Item_7 + Item_7.1 | b71*t1+b72*t2+b73*t3
   Item_8 + Item_8.1 | b81*t1+b82*t2+b83*t3
   Item_9 + Item_9.1 | b91*t1+b92*t2+b93*t3
   Item_10 + Item_10.1 | b101*t1+b102*t2+b103*t3

   # Variances
   Item_1 ~~ Item_1
   Item_2 ~~ Item_2
   Item_3 ~~ Item_3
   Item_4 ~~ Item_4
   Item_5 ~~ Item_5
   Item_6 ~~ Item_6
   Item_7 ~~ Item_7
   Item_8 ~~ Item_8
   Item_9 ~~ Item_9
   Item_10 ~~ Item_10
   Item_1.1 ~~ Item_1.1
   Item_2.1 ~~ Item_2.1
   Item_3.1 ~~ Item_3.1
   Item_4.1 ~~ Item_4.1
   Item_5.1 ~~ Item_5.1
   Item_6.1 ~~ Item_6.1
   Item_7.1 ~~ Item_7.1
   Item_8.1 ~~ Item_8.1
   Item_9.1 ~~ Item_9.1
   Item_10.1 ~~ Item_10.1
   F1 ~~ 1*F1
   F2 ~~ F2
   F1 ~~ F2

   # Intercepts
   Item_1 ~1
   Item_2 ~1
   Item_3 ~1
   Item_4 ~1
   Item_5 ~1
   Item_6 ~1
   Item_7 ~1
   Item_8 ~1
   Item_9 ~1
   Item_10 ~1
   Item_1.1 ~1
   Item_2.1 ~1
   Item_3.1 ~1
   Item_4.1 ~1
   Item_5.1 ~1
   Item_6.1 ~1
   Item_7.1 ~1
   Item_8.1 ~1
   Item_9.1 ~1
   Item_10.1 ~1
   F1 ~ 0*1
   F2 ~1
   '

Op zondag 23 mei 2021 om 05:02:53 UTC+2 schreef kma...@aol.com:

Mauricio Garnier-Villarreal

unread,
May 23, 2021, 7:45:59 PM5/23/21
to lavaan
Berend

From what I understand, it seems that you didnt include the across time indicator covariances in the data simulation model. If that is the case, the addition of these parameters in the aalysis model can lead to the difference in results.
So, try excluding this section
# Correlated errors over time
   Item_1 ~~ Item_1.1
   Item_2 ~~ Item_2.1
   Item_3 ~~ Item_3.1
   Item_4 ~~ Item_4.1
   Item_5 ~~ Item_5.1
   Item_6 ~~ Item_6.1
   Item_7 ~~ Item_7.1
   Item_8 ~~ Item_8.1
   Item_9 ~~ Item_9.1
   Item_10 ~~ Item_10.1

And finally, simulations will reach the same population value in long run, meaning that the single data that you simulated is not going to have mean = 0.5, but in the long run in average they will reach the 0.5 across multiple simulations. As long as you have the correct model

Berend

unread,
May 24, 2021, 8:52:15 AM5/24/21
to lavaan
Hi Mauricio,

Thanks for your thoughts. 
You have a point that I had not included across time indicator covariances in the simulation model. These covariances were all small and not (highly) significant:

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .Item_1 ~~                                           
   .Item_1.1          0.013    0.005    2.619    0.009
 .Item_2 ~~                                           
   .Item_2.1         -0.017    0.007   -2.485    0.013
 .Item_3 ~~                                           
   .Item_3.1         -0.007    0.007   -1.049    0.294
 .Item_4 ~~                                           
   .Item_4.1          0.006    0.005    1.249    0.212
 .Item_5 ~~                                           
   .Item_5.1         -0.001    0.005   -0.189    0.850
 .Item_6 ~~                                           
   .Item_6.1          0.001    0.005    0.133    0.894
 .Item_7 ~~                                           
   .Item_7.1          0.011    0.004    2.349    0.019
 .Item_8 ~~                                           
   .Item_8.1         -0.004    0.006   -0.666    0.505
 .Item_9 ~~                                           
   .Item_9.1         -0.006    0.005   -1.188    0.235
 .Item_10 ~~                                          
   .Item_10.1        -0.007    0.007   -0.959    0.337

I repeated the analysis excluding the across time indicator covariances from the model. This did not change the latent mean (0.200) and variance for T2 (1.326).

I don't think chance fluctuation can explain the departure of my results from the simulated population values. Note that I had simulated 20,000 simulees.
I repeated the simulations 10 times. The results were similar:

> round(T2.mean,3)
 [1] 0.205 0.201 0.202 0.199 0.200 0.198 0.198 0.199 0.207 0.197
> round(T2.variance,3)
 [1] 1.312 1.301 1.317 1.304 1.309 1.308 1.301 1.299 1.320 1.308

The question is still open: Why does lavaan fail to estimate the simulated latent mean (0.5) and variance (2.0) of the Time 2 factor?
Berend

Op maandag 24 mei 2021 om 01:45:59 UTC+2 schreef mauga...@gmail.com:

Mauricio Garnier-Villarreal

unread,
May 24, 2021, 9:08:14 AM5/24/21
to lavaan
Berend

Ok, good you have the same analysis model as simulation now. Agree, with that large sample and consistent behavior over at least the 10 reps, does shows its a consistent issue.

You said, you use mort for the data simulation part, can you share this data simulation code so I can do a couple of tests and see if there is a program difference?

Berend

unread,
May 24, 2021, 9:33:39 AM5/24/21
to lavaan
Sure. I'm glad you are prepared to help.
Berend

library(mirt)

set.seed(12345)

nitems <- 10          # number of items per scale

b2  <- c(-0.8, -0.8, -0.4, -0.4, 0, 0, 0.4, 0.4, 0.8, 0.8)
bc <- b2/4
b1 <- b2 - 1 + sample(bc)
b3 <- b2 + 1 + sample(bc)
a1 <- sample( 1.7+b2/2 )

cf.simb <- as.matrix( data.frame(a1,b1,b2,b3) )
cf.simb <- as.data.frame(cf.simb)

# Transform b-parameters to d-parameters (mirt works with d-parameters)
# difficulty (b) = easiness (d) / -a

cf.sim <- cf.simb

colnames(cf.sim) <- c("a1","d1","d2","d3")

cf.sim$d1 <- -cf.simb$b1*cf.sim$a1
cf.sim$d2 <- -cf.simb$b2*cf.sim$a1
cf.sim$d3 <- -cf.simb$b3*cf.sim$a1

### Simulate dataset using 'mirt'

a1 <- as.matrix(cf.sim[ , 1])
d1 <- as.matrix(cf.sim[ , -1])

# Create theta T1 and dat1 (baseline)

N <- 20000         # sample size

set.seed( sample(10000:20000,1) )

tet1s <- rnorm(N, 0, 1)          # simulate theta T1
tet1s <- as.matrix(tet1s)

dat1 <- simdata(a1, d1, N, itemtype="graded", Theta=tet1s)
dat1 <- as.data.frame(dat1)

# Create theta change

tetchs <- rnorm(N, 0.5, 1)

# Create theta T2 and dat2  (follow-up)

tet2s <- as.matrix(tet1s + tetchs)     # theta T2
# mean(tet2s)
# var(tet2s)

set.seed( sample(30000:40000,1) )

dat2 <- simdata(a1, d1, N, itemtype="graded", Theta=tet2s)
dat2 <- as.data.frame(dat2)

datw <- data.frame(dat1, dat2)


Op maandag 24 mei 2021 om 15:08:14 UTC+2 schreef mauga...@gmail.com:

Mauricio Garnier-Villarreal

unread,
May 24, 2021, 11:34:05 AM5/24/21
to lavaan
Berend

First about data simulation. I would recommend to use a multivruate for simulation of thetas. Like this

cov2cor(rbind(c(1, 0.99),
              c(0.99, 2)) )

thetas <- rockchalk::mvrnorm(n=N, mu=c(0, 0.5), Sigma = rbind(c(1, 0.99),
                                                              c(0.99, 2)) )


Next, by simulating them separately
dat1 <- simdata(a1, d1, N, itemtype="graded", Theta=as.matrix(thetas[,1]) )
dat1 <- as.data.frame(dat1)

## data 2

dat2 <- simdata(a1, d1, N, itemtype="graded", Theta=as.matrix(thetas[,2]) )

dat2 <- as.data.frame(dat2)

datw <- data.frame(dat1, dat2)

Yes, this accounts for all the parameters being equal between times. But the factor correlation is not included in the simulation. Not sure how to do this simulation in mirt with multiple correlated factors. I might be worng, but do fell that both times should be simulated together, including the fctor correlation

Last about the model, as you were using the lavaan function, you were missing some fixed parameters. If I use a simpleify syntax, and the cfa function, the mean for time 2 is closer 0.4



model <- '
   # Factors
   F1 =~ a1*Item_1+a2*Item_2+a3*Item_3+a4*Item_4+a5*Item_5+a6*Item_6+
         a7*Item_7+a8*Item_8+a9*Item_9+a10*Item_10
   F2 =~ a1*Item_1.1+a2*Item_2.1+a3*Item_3.1+a4*Item_4.1+a5*Item_5.1+
         a6*Item_6.1+a7*Item_7.1+a8*Item_8.1+a9*Item_9.1+a10*Item_10.1

   # Thresholds
   Item_1 + Item_1.1 | b11*t1+b12*t2+b13*t3
   Item_2 + Item_2.1 | b21*t1+b22*t2+b23*t3
   Item_3 + Item_3.1 | b31*t1+b32*t2+b33*t3
   Item_4 + Item_4.1 | b41*t1+b42*t2+b43*t3
   Item_5 + Item_5.1 | b51*t1+b52*t2+b53*t3
   Item_6 + Item_6.1 | b61*t1+b62*t2+b63*t3
   Item_7 + Item_7.1 | b71*t1+b72*t2+b73*t3
   Item_8 + Item_8.1 | b81*t1+b82*t2+b83*t3
   Item_9 + Item_9.1 | b91*t1+b92*t2+b93*t3
   Item_10 + Item_10.1 | b101*t1+b102*t2+b103*t3

   F1 ~~ 1*F1
   F2 ~~ NA*F2
   F1 ~~ NA*F2


   F1 ~ 0*1
   F2 ~NA*1
   '

fit <- cfa(model, data=datw, std.lv=T, ordered=T)
summary(fit, standardized=T)


Mauricio Garnier-Villarreal

unread,
May 24, 2021, 3:32:10 PM5/24/21
to lavaan
Berend

I would also recommend you to run the model on mirt (i havent work with those types of constraints on mirt), to see if it is working well from where you are simulating the data

Berend

unread,
May 25, 2021, 12:44:19 PM5/25/21
to lavaan
Mauricio,

Thanks again for your suggestions. 

The multivariate simulation is a good idea. Unfortunately, it did not solve the problem. The latent mean and variance of T2 is still underestimated.

The cfa function fixes the item intercepts to zero. I thought that it was not a good idea to fix the intercepts while the thresholds and loadings are constrained equal over time. Shouldn't the item intercepts move along with the latent factor mean?

Op maandag 24 mei 2021 om 17:34:05 UTC+2 schreef mauga...@gmail.com:

Berend

unread,
May 25, 2021, 12:49:14 PM5/25/21
to lavaan
Mauricio,

I analyzed the simulated dataset (also a newly simulated dataset based on  multivariate theta simulation and data simulation of both times together) with mirt, constraining the discrimination and difficulty parameters to be the same over time. Mirt has no problem in recovering the simulated latent mean and variance.

Berend

Op maandag 24 mei 2021 om 21:32:10 UTC+2 schreef mauga...@gmail.com:

Mauricio Garnier-Villarreal

unread,
May 25, 2021, 8:20:24 PM5/25/21
to lavaan
Berend

Ok, I found how to make it work.
For invariance constraints for categorical items, you want to fix the intercepts of all the items to 0. This way it can estimate the latent mean at time 2.
Also, you want to fix the indicator variances at 1 at time 1 and estiate them at time 2.
Notice that these intercepts and variances are not for the indicators, but for the Latent Response Distribution for the underlying continuous distribtibution that gives rise to the categorical responses.
Finally, you want to use theta parameterization. With these changes I get the variance of 2 and mean of 0.5 at time 2, and correlation of 0.7


model <- '
   # Factors
   F1 =~ a1*Item_1+a2*Item_2+a3*Item_3+a4*Item_4+a5*Item_5+a6*Item_6+
         a7*Item_7+a8*Item_8+a9*Item_9+a10*Item_10
   F2 =~ a1*Item_1.1+a2*Item_2.1+a3*Item_3.1+a4*Item_4.1+a5*Item_5.1+
         a6*Item_6.1+a7*Item_7.1+a8*Item_8.1+a9*Item_9.1+a10*Item_10.1

   # Thresholds
   Item_1 + Item_1.1 | b11*t1+b12*t2+b13*t3
   Item_2 + Item_2.1 | b21*t1+b22*t2+b23*t3
   Item_3 + Item_3.1 | b31*t1+b32*t2+b33*t3
   Item_4 + Item_4.1 | b41*t1+b42*t2+b43*t3
   Item_5 + Item_5.1 | b51*t1+b52*t2+b53*t3
   Item_6 + Item_6.1 | b61*t1+b62*t2+b63*t3
   Item_7 + Item_7.1 | b71*t1+b72*t2+b73*t3
   Item_8 + Item_8.1 | b81*t1+b82*t2+b83*t3
   Item_9 + Item_9.1 | b91*t1+b92*t2+b93*t3
   Item_10 + Item_10.1 | b101*t1+b102*t2+b103*t3

   F1 ~~ 1*F1
   F2 ~~ NA*F2
   F1 ~~ NA*F2


   F1 ~ 0*1
   F2 ~NA*1
   
   Item_1.1 ~~ NA*Item_1.1
   Item_2.1 ~~ NA*Item_2.1
   Item_3.1 ~~ NA*Item_3.1
   Item_4.1 ~~ NA*Item_4.1
   Item_5.1 ~~ NA*Item_5.1
   Item_6.1 ~~ NA*Item_6.1
   Item_7.1 ~~ NA*Item_7.1
   Item_8.1 ~~ NA*Item_8.1
   Item_9.1 ~~ NA*Item_9.1
   Item_10.1 ~~ NA*Item_10.1
   '

fit <- cfa(model, data=datw, std.lv=T, ordered=T,
           parameterization="theta")
summary(fit, standardized=T)

Berend

unread,
May 26, 2021, 6:03:14 AM5/26/21
to lavaan
Hi Mauricio,

Yes, it works! Great help! Thanks a lot for helping me out.

Berend

Op woensdag 26 mei 2021 om 02:20:24 UTC+2 schreef mauga...@gmail.com:
Reply all
Reply to author
Forward
0 new messages