Modeling covariances between dummy variables

202 views
Skip to first unread message

Ángel Rodríguez Laso

unread,
Nov 12, 2022, 6:45:46 AM11/12/22
to lav...@googlegroups.com
Dear all,

What is the correct way to model a covariance between dummy variables?

'primary' and 'morethanprimary' are dummy variables for education.

If I model their covariance like that of any other variables (primary
~~ morethanprimary), I get:

Warning message:

In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :

lavaan WARNING:

The variance-covariance matrix of the estimated parameters (vcov)
does not appear to be positive definite! The smallest eigenvalue (=
-2.147731e-12) is smaller than zero. This may be a symptom that the
model is not identified.


If I drop the term primary ~~ morethanprimary or change it to primary
~~ 0*morethanprimary, then lavaan provides a solution with one extra
degree of freedom and a residual between the two dummy variables of
-0.602, but no MI is above 10.

Any way out?

Thank you,

Ángel

Edward Rigdon

unread,
Nov 12, 2022, 8:00:08 AM11/12/22
to lav...@googlegroups.com
Angel--
     That is very interesting. Don't assume that this particular covariance is the immediate cause of the problem. It could be that the model with the covariance restricted is inducing a bias elsewhere that is preventing the message, while the more correct model, which includes the covariance, fails the empirical identification check.
     What does the rest of the model look like? Depending on model structure, it is possible that statistical identification hinges on this covariance being 0.
     It would probably be good to compare the entire vector of parameter estimates, to see what else changes when this covariance is allowed or restricted.
--Ed Rigdon

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CACpA%3DVQ6HYoR3dn9i7OK_7H3MD7%2Bbb4O6vOqWZ-SxDC3gs_sqg%40mail.gmail.com.

Shu Fai Cheung (張樹輝)

unread,
Nov 12, 2022, 8:18:46 AM11/12/22
to lavaan
Dear Ángel,

How many groups does education have? With just 'primary' and 'morethanprimary', the number of groups should three but I am not sure whether there are other dummy variables for education in the model and the covariance issue occurred only for these two dummy variables.

What are the number of cases for the groups in education? Though unlikely to be the source of the problem, the frequencies does affect the sample covariances between the dummy variables.

-- Shu Fai

Yves Rosseel

unread,
Nov 12, 2022, 9:36:32 AM11/12/22
to lav...@googlegroups.com
If the dummy variables are exogenous (independent), then the best
approach is to NOT mention them in the model syntax, and use fixed.x =
TRUE (which is the default) in the sem() call.

This is one of the advantages of fixed.x = TRUE: the
means/variances/covariances of exogenous covariates are no longer model
parameters. Instead, we just use the sample-based statistics.

If you must have them as free parameters, recall that you cannot
estimate both the mean and the variance of a binary (dummy) variable.
The variance is a function of the mean (ie: var = p * (1 - p) where p is
the mean or proportion of a binary variable). So you can estimate the
mean (p), but then you must define (using the := operator) the variance
to be p * (1 - p). If you estimate them both, you get this warning:

> lavaan WARNING:
>
> The variance-covariance matrix of the estimated parameters (vcov)
> does not appear to be positive definite! The smallest eigenvalue (=
> -2.147731e-12) is smaller than zero. This may be a symptom that the
> model is not identified.

Yves.

Ángel Rodríguez Laso

unread,
Nov 14, 2022, 5:46:04 AM11/14/22
to lav...@googlegroups.com
Thank you for your interest, Ed.

This is the complete model (primarios=primary; masqueprim=more than primary):

modelfus3 <- '
# latent variable definitions
# reflective
CHARL =~ 1.0*charlsonw1
AGE =~ 1.0*HI8_w2

# Latent variable variances
CHARL ~~ CHARL
AGE ~~ AGE

#fix error variance: (1-reliability)*variance
charlsonw1 ~~ 0.09*0.8285176*charlsonw1
HI8_w2 ~~ 0.01*0.262604*HI8_w2

# regressions
depabvdw2 ~ AGE + CHARL + woman + primarios + masqueprim
depaivdw2 ~ AGE + CHARL + woman + primarios + masqueprim
hosp3y ~ AGE + CHARL + woman + primarios + masqueprim

# covariances
AGE ~~ CHARL + woman + primarios + masqueprim
CHARL ~~ woman + primarios + masqueprim
woman ~~ primarios + masqueprim
primarios ~~ masqueprim

# intercepts
charlsonw1 ~ 1
HI8_w2 ~ 1

# thresholds
depabvdw2 | t1
depaivdw2 | t1
hosp3y | t1
woman | t1
primarios| t1
masqueprim | t1
'
fitfus3 <- sem(modelfus3, data=hoy, ordered=c("depabvdw2","depaivdw2","hosp3y"))



I copy the parameters that are different between this model and the
one without primarios ~~ masqueprim:

Regressions:

Estimate Std.Err z-value P(>|z|) Std.lv Std.all

WITH primarios~~masqueprim

depabvdw2 ~

AGE 3.722 9.971 0.373 0.709 1.893 0.583

CHARL 0.130 0.493 0.265 0.791 0.113 0.035

woman 0.580 1.383 0.419 0.675 0.580 0.178

primarios 2.048 6.209 0.330 0.741 2.048 0.631

masqueprim 2.240 7.062 0.317 0.751 2.240 0.690

WITHOUT

depabvdw2 ~

AGE 0.922 0.297 3.109 0.002 0.469 0.430

CHARL 0.219 0.131 1.673 0.094 0.190 0.174

woman 0.208 0.170 1.218 0.223 0.208 0.190

primarios 0.312 0.182 1.713 0.087 0.312 0.286

masqueprim 0.218 0.234 0.932 0.351 0.218 0.200



WITH

depaivdw2 ~

AGE 2.992 6.240 0.480 0.632 1.522 0.706

CHARL -0.005 0.319 -0.017 0.987 -0.005 -0.002

woman 0.297 0.860 0.345 0.730 0.297 0.138

primarios 1.236 3.900 0.317 0.751 1.236 0.574

masqueprim 1.423 4.414 0.322 0.747 1.423 0.661

WITHOUT

depaivdw2 ~

AGE 1.270 0.214 5.938 0.000 0.646 0.627

CHARL 0.053 0.117 0.455 0.649 0.046 0.045

woman 0.071 0.127 0.559 0.576 0.071 0.069

primarios 0.141 0.165 0.853 0.394 0.141 0.136

masqueprim 0.194 0.170 1.142 0.253 0.194 0.189

WITH

hosp3y ~

AGE -2.141 11.641 -0.184 0.854 -1.089 -0.343

CHARL 0.137 0.537 0.256 0.798 0.119 0.037

woman -0.500 1.581 -0.316 0.752 -0.500 -0.158

primarios -2.065 7.326 -0.282 0.778 -2.065 -0.651

masqueprim -2.131 8.219 -0.259 0.795 -2.131 -0.672

WITHOUT

hosp3y ~

AGE 0.622 0.354 1.759 0.079 0.317 0.291

CHARL 0.057 0.146 0.387 0.699 0.049 0.045

woman -0.126 0.146 -0.863 0.388 -0.126 -0.116

primarios -0.400 0.287 -1.395 0.163 -0.400 -0.367

masqueprim -0.106 0.220 -0.484 0.628 -0.106 -0.098



Covariances:

Estimate Std.Err z-value P(>|z|) Std.lv Std.all

WITH

primarios ~~

masqueprim -0.602 0.248 -2.434 0.015 -0.602 -0.602

.depabvdw2 ~~

.depaivdw2 0.201 1.585 0.127 0.899 0.201 0.032

.hosp3y 0.773 2.867 0.270 0.787 0.773 0.083

.depaivdw2 ~~

.hosp3y 0.780 1.786 0.437 0.662 0.780 0.129

WITHOUT

depabvdw2 ~~

.depaivdw2 0.630 0.103 6.135 0.000 0.630 0.761

.hosp3y 0.101 0.185 0.548 0.583 0.101 0.117

.depaivdw2 ~~

.hosp3y 0.358 0.119 3.002 0.003 0.358 0.459



Variances:

WITH

Estimate Std.Err z-value P(>|z|) Std.lv Std.all

.depabvdw2 9.589 9.589 0.909

.depaivdw2 4.063 4.063 0.875

.hosp3y 9.035 9.035 0.899

WITHOUT

.depabvdw2 0.916 0.916 0.771

.depaivdw2 0.749 0.749 0.705

.hosp3y 0.814 0.814 0.685



R-Square:

WITH

Estimate

depabvdw2 0.091

depaivdw2 0.125

hosp3y 0.101

WITHOUT

depabvdw2 0.229

depaivdw2 0.295

hosp3y 0.315


Massive differences. The results WITHOUT are what one would expect:
age is associated with the outcomes and some of the outcomes are
associated.

Ángel





El sáb, 12 nov 2022 a las 14:00, Edward Rigdon
(<edward...@gmail.com>) escribió:
> To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAHxMgefb5iZsZogrAZN%2B9vq%3DiCWeOq1wY2FjVfS5efHELJTdNg%40mail.gmail.com.

Ángel Rodríguez Laso

unread,
Nov 14, 2022, 5:49:12 AM11/14/22
to lav...@googlegroups.com
Thank you for your interest, Shu.

The variable education has three levels with less than primary= 129,
primary=28, more than primary=43.

Ángel

El sáb, 12 nov 2022 a las 14:18, Shu Fai Cheung (張樹輝)
(<shufai...@gmail.com>) escribió:
> --
> You received this message because you are subscribed to the Google Groups "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/ac49a746-837b-4d41-b4f1-cf1ce72d3d40n%40googlegroups.com.

Ángel Rodríguez Laso

unread,
Nov 14, 2022, 5:58:18 AM11/14/22
to lav...@googlegroups.com
Thank you for your interest, Yves.

I prefer to model the covariances explicitly.

In any case, with the model I wrote in a message above but taking away
the covariances, I get:

modelfus3 <- '
# latent variable definitions
# reflective
CHARL =~ 1.0*charlsonw1
AGE =~ 1.0*HI8_w2

# Latent variable variances
CHARL ~~ CHARL
AGE ~~ AGE

#fix error variance: (1-reliability)*variance
charlsonw1 ~~ 0.09*0.8285176*charlsonw1
HI8_w2 ~~ 0.01*0.262604*HI8_w2

# regressions
depabvdw2 ~ AGE + CHARL + woman + primarios + masqueprim
depaivdw2 ~ AGE + CHARL + woman + primarios + masqueprim
hosp3y ~ AGE + CHARL + woman + primarios + masqueprim

# covariances
#AGE ~~ CHARL + woman + primarios + masqueprim
#CHARL ~~ woman + primarios + masqueprim
#woman ~~ primarios + masqueprim
#primarios ~~ masqueprim

# intercepts
charlsonw1 ~ 1
HI8_w2 ~ 1

# thresholds
depabvdw2 | t1
depaivdw2 | t1
hosp3y | t1
woman | t1
primarios| t1
masqueprim | t1
'
fitfus3 <- sem(modelfus3, data=hoy, ordered=c("depabvdw2","depaivdw2","hosp3y"))


starting values:

[1] 1.0000000000 1.0000000000 0.7461618333 0.2182448308
0.0745665840 0.0026260400 0.0000000000 0.0000000000 0.0000000000
0.0000000000 0.0000000000
[12] 0.0000000000 0.0000000000 0.0000000000 0.0000000000
0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
0.0000000000 0.5699864890
[23] 7.4805114291 1.5942780578 0.8713392826 0.8258640294
NA NA NA 1.0000000000 1.0000000000
1.0000000000 0.0000000000
[34] 0.0000000000 0.0000000000 0.0000000000 1.0000000000
-0.0007035176 -0.0287185930 1.0000000000 -0.0302512563 1.0000000000
1.0000000000 1.0000000000
[45] 1.0000000000 1.0000000000 1.0000000000 1.0000000000
0.0000000000 0.0000000000 0.0000000000 0.5050000000 0.1400000000
0.2150000000 0.0000000000
[56] 0.0000000000

Error in tmp[cbind(REP$row[idx], REP$col[idx])] <- lavpartable$free[idx] :

NAs are not allowed in subscripted assignments

In addition: Warning messages:

1: In lav_data_full(data = data, group = group, cluster = cluster, :

lavaan WARNING: exogenous variable(s) declared as ordered in data:
woman primarios masqueprim

2: In lav_partable_flat(FLAT, blocks = "group", meanstructure =
meanstructure, :

lavaan WARNING: thresholds are defined for exogenous variables:
woman primarios masqueprim

3: In out[idx] <- pta$vidx$th[[b]] :

number of items to replace is not a multiple of replacement length

4: In muthen1984(Data = X[[g]], wt = WT[[g]], ov.names = ov.names[[g]], :

lavaan WARNING: trouble constructing W matrix; used generalized
inverse for A11 submatrix

5: In lav_start(start.method = lavoptions$start, lavpartable = lavpartable, :

lavaan WARNING: some starting values are non-finite; replacing them
with 0.5; please provide better starting values.


Ángel
> --
> You received this message because you are subscribed to the Google Groups "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/a42324bd-e9bc-2c90-cab0-76b637f8cfda%40gmail.com.

Edward Rigdon

unread,
Nov 14, 2022, 8:08:58 AM11/14/22
to lav...@googlegroups.com
Angel--
     The structure of your model is not the kind to give rise to identification problems, with or without the covariance. The model looks basically saturated, especially with, free residual covariances among the three outcome variables.
     Yet the impact of that one covariance is profound. The variable AGE seems to be strongly affected. That makes me wonder if the two variables in question plus AGE form a triplet with almost perfect determination (the covariance matrix has a determinant near 0). "Near" is a matter of sample size--if yours is small, then near includes a larger range of values.
     Otherwise, it is hard to see where the identification issue comes from. A model could have an identification problem in the middle of estimation, as parameter estimates migrate from starting values to final estimates...hmmm, maybe the combination of strong collinearity and weak starting values is the problem. (Just fishing here) What if you (a) allow the covariance but (b) set starting values near those for the model without the covariance? Maybe you could sidestep a bad patch of parameter space that way.
     But the ordinal variable part complicates things. I am surprised that you declare WOMAN ordinal rather than just let it be a dummy variable.
--Ed Rigdon

Ángel Rodríguez Laso

unread,
Nov 14, 2022, 9:37:10 AM11/14/22
to lav...@googlegroups.com
Ed,

I suppose that when you say that I declared woman an ordinal variable
it is because I modeled its threshold: if I don´t do that, the
estimation does not converge, even after taking away
primarios~~masqueprim.
The determinant of the covariance matrix of AGE, primarios and
masqueprim is 0.004356.
I've set the start values in the model with primarios~~masqueprim to
those of the column 'Std.all' of the regressions and covariances of
the model without primarios~~masqueprim, but the model did not
converge.

Ángel


El lun, 14 nov 2022 a las 14:08, Edward Rigdon
> To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAHxMgef5hGy08crXwE47PkE6%2BbdpHSLVeL4QHKPVtpTY0fCuXA%40mail.gmail.com.

Ángel Rodríguez Laso

unread,
Nov 16, 2022, 3:22:01 AM11/16/22
to lav...@googlegroups.com
Dear Yves and members of the list,

From an Yves' message: 'So you can estimate the mean (p), but then you
must define (using the := operator) the variance to be p * (1 - p).'

What would be the correct syntax to do that? Can you provide an
example with two dummy variables?

Thank you,

Ángel

El sáb, 12 nov 2022 a las 15:36, Yves Rosseel (<yros...@gmail.com>) escribió:
>
> --
> You received this message because you are subscribed to the Google Groups "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/a42324bd-e9bc-2c90-cab0-76b637f8cfda%40gmail.com.

Terrence Jorgensen

unread,
Dec 1, 2022, 8:15:34 AM12/1/22
to lavaan
you can estimate the mean (p), but then you
must define (using the := operator) the variance to be p * (1 - p).'

What would be the correct syntax to do that? Can you provide an
example with two dummy variables?

I think you can just use labels and constraints in the ?model.syntax
 
x ~ mean_x*1
x ~~ var_x*x

var_x == mean_x * (1 - mean_x)

No need for the := operator here.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Reply all
Reply to author
Forward
0 new messages