Cluster Variable contains missing values

237 views
Skip to first unread message

Elisabeth Graf

unread,
Jun 19, 2019, 6:04:31 AM6/19/19
to lavaan
Dear lavaan team,

I am conducting a path model analysis for my master thesis using lavaan. As my data is partially nested (teachers from different schools, but no school variable for parents) I tried to control for the schools by including them as cluster:

Fit <- sem(model, data=sgmd, ordered="SD24", cluster="schools", missing="listwise")

Due to one endogenous dichotomous variable, I am using DWLS estimate and listwise deletion. I was wondering now how lavaan treats my partially nested structured - R responds with a warning ("lavaan warning: cluster variable "schools" contains missing values"), but in the summary statistics, all cases are still included, and results rarely differ whether or not I include the cluster. Is the partially two-level data structur controleld even if I get the warning on missing values? Or is there another way to control for partially nested structur in lavaan?

Thank you in advance,
Elisabeth

Terrence Jorgensen

unread,
Jun 19, 2019, 7:54:41 AM6/19/19
to lavaan
my data is partially nested (teachers from different schools, but no school variable for parents) I tried to control for the schools by including them as cluster:


 
Due to one endogenous dichotomous variable, I am using DWLS estimate and listwise deletion.

Clustered ordered data cannot yet be handled by lavaan.


"lavaan warning: cluster variable "schools" contains missing values", but in the summary statistics, all cases are still included

But you just said there is no "school" ID for parents.  Does that mean that variable is NA for rows with parent data?

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Elisabeth Graf

unread,
Jun 19, 2019, 12:27:51 PM6/19/19
to lavaan
Hi,

thanks for the link to the paper - I already looked into it some time ago, but thought it does not help me if I am limited to use R in my analysis. I will check again whether it can help me.

Indeed, the variable school is NA for parents. I was just not sure whether it did or did not control for the cluster variable in the analysis even if giving the warning on the missing data.

Greetings,
Elisabeth Graf

Terrence Jorgensen

unread,
Jun 20, 2019, 3:07:41 PM6/20/19
to lavaan
Indeed, the variable school is NA for parents. I was just not sure whether it did or did not control for the cluster variable in the analysis even if giving the warning on the missing data.

Run the model again without using the cluster arugment, then compare sample sizes.  I would guess that lavaan is forced to listwise-delete any observations for whom the cluster is NA, unless it does something like treat all NAs as a single cluster.

Elisabeth Graf

unread,
Jun 25, 2019, 3:32:31 AM6/25/19
to lavaan
The number of observations stays the same, I checked now the number of clusters - it seems as it takes all missing values as one cluster.

Terrence Jorgensen

unread,
Jun 25, 2019, 10:33:36 AM6/25/19
to lavaan
it seems as it takes all missing values as one cluster.

In that case, you can assign them unique IDs, so that they are all single-subject clusters.  Then you can use the MSEM-PN method in the Sterba et al. article I linked to above.  Or you can reformat your data as "wide" to use the SSEM method (see Figure 1 for the difference).
Reply all
Reply to author
Forward
0 new messages