Running a model with single indicator factors

314 views
Skip to first unread message

Mark Relyea

unread,
Mar 14, 2019, 2:43:18 PM3/14/19
to lavaan

I am trying to specify a lavaan CFA that has some factors with three indicators, one factor with two indicators, and two factors with one indicator each. For the single indicator factors, I followed advice to specify the variance of the observed variable to 0 so the latent variable will account for all of the variance in the observed variable. I saw other posts in this group saying setting the variance to 0 may be unnecessary now as this is the default behavior. Either way I run it, I still get a covariance matrix that is not positive definite and a model where the latent variance does not equal the variance for the single items. What is the correct way to model single indicator factors? I've created a reproducible example below.


Simulated data with a positive definite covariate matrix.  library(MASS) mu <- c(2.5,3,3.1,3.5,2.1,1.5,2,4,4.2,4.5) Sigma <- matrix(.5, nrow=10, ncol=10) + diag(10)*.5 set.seed(1527) rawvars <- mvrnorm(n=200, mu=mu, Sigma=Sigma)

Convert latent responses to positive ordered categories (to create likert-type items).

i1 = findInterval(rawvars[,1], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) i2 = findInterval(rawvars[,2], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) i3 = findInterval(rawvars[,3], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) i4 = findInterval(rawvars[,4], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) i5 = findInterval(rawvars[,5], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) i6 = findInterval(rawvars[,6], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) i7 = findInterval(rawvars[,7], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) i8 = findInterval(rawvars[,8], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) i9 = findInterval(rawvars[,9], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) i10 = findInterval(rawvars[,10], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) df <- data.frame(cbind(i1,i2,i3,i4,i5,i6,i7,i8,i9,i10))

Confirm all items have small to moderate positive correlations.

> round(cor(df),2) i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i1 1.00 0.53 0.55 0.50 0.50 0.42 0.51 0.48 0.51 0.38 i2 0.53 1.00 0.51 0.48 0.39 0.45 0.42 0.50 0.55 0.40 i3 0.55 0.51 1.00 0.46 0.49 0.37 0.43 0.52 0.48 0.42 i4 0.50 0.48 0.46 1.00 0.34 0.43 0.35 0.50 0.42 0.42 i5 0.50 0.39 0.49 0.34 1.00 0.37 0.43 0.36 0.35 0.39 i6 0.42 0.45 0.37 0.43 0.37 1.00 0.34 0.38 0.39 0.32 i7 0.51 0.42 0.43 0.35 0.43 0.34 1.00 0.40 0.46 0.37 i8 0.48 0.50 0.52 0.50 0.36 0.38 0.40 1.00 0.39 0.35 i9 0.51 0.55 0.48 0.42 0.35 0.39 0.46 0.39 1.00 0.43 i10 0.38 0.40 0.42 0.42 0.39 0.32 0.37 0.35 0.43 1.00

Built model constraining two indicator factors to be equal and specifying the variance on the single indicator factor variables (i1 and i10) to 0.

> library(lavaan) fa.mod <- ' f1=~ i1 f2=~ i2 + i3 + i4 f3=~ i5 + i6 + i7 f4=~ ai8 + ai9 f5=~ i10 i1~~ 0*i1 i10~~ 0*i10'

Model fit was not positive definite and the latent variance does not equal the variance for the single items.

> fa.fit<- sem(fa.mod,data=df) Warning message: In lav_object_post_check(object) : lavaan WARNING: covariance matrix of latent variables is not positive definite; use inspect(fit,"cov.lv") to investigate.

> inspect(fa.fit,"cov.lv")
   f1    f2    f3    f4    f5   
f1 1.312                        
f2 0.755 0.766                  
f3 0.595 0.555 0.451            
f4 0.565 0.614 0.419 0.383      
f5 0.394 0.465 0.360 0.356 0.828

> summary(fa.fit,fit.measures=T,standardized=T)
lavaan (0.5-23.1097) converged normally after  39 iterations

  Number of observations                           200

  Estimator                                         ML
  Minimum Function Test Statistic               26.478
  Degrees of freedom                                28
  P-value (Chi-square)                           0.547

Model test baseline model:

  Minimum Function Test Statistic              760.907
  Degrees of freedom                                45
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.003

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -2560.036
  Loglikelihood unrestricted model (H1)      -2546.797

  Number of free parameters                         27
  Akaike (AIC)                                5174.071
  Bayesian (BIC)                              5263.126
  Sample-size adjusted Bayesian (BIC)         5177.587

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent Confidence Interval          0.000  0.051
  P-value RMSEA |z|)   Std.lv  Std.all
  f1 =~                                                                 
    i1                1.000                               1.146    1.000
  f2 =~                                                                 
    i2                1.000                               0.875    0.722
    i3                0.971    0.100    9.709    0.000    0.850    0.716
    i4                0.911    0.103    8.856    0.000    0.798    0.653
  f3 =~                                                                 
    i5                1.000                               0.671    0.633
    i6                0.736    0.106    6.947    0.000    0.494    0.579
    i7                0.911    0.121    7.551    0.000    0.612    0.641
  f4 =~                                                                 
    i8         (a)    1.000                               0.619    0.601
    i9         (a)    1.000                               0.619    0.646
  f5 =~                                                                 
    i10               1.000                               0.910    1.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  f1 ~~                                                                 
    f2                0.755    0.107    7.053    0.000    0.753    0.753
    f3                0.595    0.091    6.517    0.000    0.774    0.774
    f4                0.565    0.078    7.250    0.000    0.797    0.797
    f5                0.394    0.079    4.996    0.000    0.378    0.378
  f2 ~~                                                                 
    f3                0.555    0.089    6.256    0.000    0.945    0.945
    f4                0.614    0.081    7.541    0.000    1.132    1.132
    f5                0.465    0.078    5.976    0.000    0.584    0.584
  f3 ~~                                                                 
    f4                0.419    0.065    6.430    0.000    1.008    1.008
    f5                0.360    0.065    5.515    0.000    0.589    0.589
  f4 ~~                                                                 
    f5                0.356    0.059    6.047    0.000    0.631    0.631

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .i1                0.000                               0.000    0.000
   .i10               0.000                               0.000    0.000
   .i2                0.703    0.083    8.422    0.000    0.703    0.478
   .i3                0.688    0.081    8.498    0.000    0.688    0.488
   .i4                0.854    0.095    9.020    0.000    0.854    0.573
   .i5                0.673    0.079    8.547    0.000    0.673    0.599
   .i6                0.485    0.054    8.985    0.000    0.485    0.665
   .i7                0.536    0.063    8.467    0.000    0.536    0.589
   .i8                0.678    0.080    8.527    0.000    0.678    0.639
   .i9                0.535    0.068    7.899    0.000    0.535    0.583
    f1                1.312    0.131   10.000    0.000    1.000    1.000
    f2                0.766    0.137    5.608    0.000    1.000    1.000
    f3                0.451    0.099    4.561    0.000    1.000    1.000
    f4                0.383    0.075    5.133    0.000    1.000    1.000
    f5                0.828    0.083   10.000    0.000    1.000    1.000

Christopher David Desjardins

unread,
Mar 14, 2019, 2:54:53 PM3/14/19
to lavaan

Hi Mark,
Increasing your N fixes the problem

# from 

rawvars <- mvrnorm(n=200, mu=mu, Sigma=Sigma)

# to
rawvars <- mvrnorm(n=20000, mu=mu, Sigma=Sigma)

You can also try with a different seed and it may or may not give that message.

replicate(10, {
rawvars <- mvrnorm(n=200, mu=mu, Sigma=Sigma)

i1 = findInterval(rawvars[,1], vec=c(-Inf,2,2.75,3.5,4.25,Inf))
i2 = findInterval(rawvars[,2], vec=c(-Inf,2,2.75,3.5,4.25,Inf))
i3 = findInterval(rawvars[,3], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) 
i4 = findInterval(rawvars[,4], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) 
i5 = findInterval(rawvars[,5], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) 
i6 = findInterval(rawvars[,6], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) 
i7 = findInterval(rawvars[,7], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) 
i8 = findInterval(rawvars[,8], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) 
i9 = findInterval(rawvars[,9], vec=c(-Inf,2,2.75,3.5,4.25,Inf)) 
i10 = findInterval(rawvars[,10], vec=c(-Inf,2,2.75,3.5,4.25,Inf))
df <- data.frame(cbind(i1,i2,i3,i4,i5,i6,i7,i8,i9,i10))

fa.fit<- sem(fa.mod,data=df)
})

I ran that a few times and got between 3 and 10 warnings. So sometimes it converges without a warning.


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Mark Relyea

unread,
Mar 14, 2019, 3:40:47 PM3/14/19
to lavaan
Thank you for your help identifying one potential source of this problem! Do you know why increasing the sample size helps? 

Also, while I don't get the warning message after increasing the sample size. I do still see that the  Std.lv for i1 is greater than 1. Is this an issue? 

I can't post my actual data (hence the reproducible example) but I chose N=200 as I have about that many in my actual dataset. Any advice on how to run a model with single indicators without issues given the size of my dataset? Data collection is ongoing so I may get up to 250 people but certainly not 20000! :)

Thank you again. 

Mark Relyea

unread,
Mar 14, 2019, 4:19:08 PM3/14/19
to lavaan
Thank you again for your response. I think I am all set now. I believe the answer was just to improve the model. After experimenting based on your suggestion, I realized the problem wasn't whether or not I specified 0 or whether I used single indicators for factors.

So, that left me realizing there must be a problem in my model (that was occasionally fixed by using different data). The problem was not obvious until I tested correlations between factors and looked at the modification indices. After specifying a correlation between two items on different factors, the problem went away no matter how I ran in (or on how many people).

Thank you again! 
Reply all
Reply to author
Forward
0 new messages