clustered standard error

David

unread,

Feb 15, 2022, 6:19:21 AM2/15/22

to lavaan

Dear everyone,

Im currently majoring in Psychology and working as a research assitant. So I'm far from an expert when it comes to SEM and lavaan in general.

My question relates to nested data. A multi-level model is out of question because there are too few data points at the class level. Now I'm trying to at least correct the standard error for this data structure. Does it make sense to include the school ID in the "cluster" argument?

Unfortunately my internet research didn't help, which is why I would be grateful für any help.

Greetings

David

Ranaivo Rasolofoson

unread,

Feb 15, 2022, 9:57:18 AM2/15/22

to lav...@googlegroups.com

Hi David,

Here are some of the discussions on clustered SE I gathered that may help:

https://groups.google.com/forum/#!msg/lavaan/ZdAPsB1yRTQ/YLInlDDaBgAJ

https://groups.google.com/g/lavaan/c/l6rSP4odLkQ/m/d_bjYk6pCAAJ

https://psu-psychology.github.io/r-bootcamp-2018/talks/lavaan_tutorial.html

https://psu-psychology.github.io/psy-597-sem-sp2019/10_best_practices/important_odds_and_ends.html

https://groups.google.com/g/lavaan/c/l6rSP4odLkQ/m/d_bjYk6pCAAJ

Ranaivo

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/d5845dfd-d10e-4958-afcd-2927f840eafbn%40googlegroups.com.

--

Ranaivo A. Rasolofoson, PhD, MS

Nicholas School of the Environment

Duke University

https://scholars.duke.edu/person/ranaivo.rasolofoson

Gund Affiliate, Gund Institute for Environment

University of Vermont (https://www.uvm.edu/gund)

e-mail: rras...@gmail.com

Jeremy Miles

unread,

Feb 15, 2022, 3:51:40 PM2/15/22

to lav...@googlegroups.com

If you have too few data points for multilevel models at the class level, you have too few data points to estimate clustered (sandwich /robust / Huber-White) standard errors.

The common solutions to this, are either to include class as a fixed effect in the model, or to do something Bayesian, where a prior can help with the problems you get with uncertainty at the higher level.

Perhaps tell us more about your data?

Jeremy

David

unread,

Feb 16, 2022, 7:00:05 AM2/16/22

to lavaan

@Ranaivo

Thank you for the threads, it's super useful.

@Jeremy

Generally speaking im trying to predict adaptivity after errors with a contextual factor namely error climate. The data was collected in different school classes hence the nested data structure. Due to Covid we had to stop the data collection earlier which leaves us with 14 data points at class level. To my knowledge that is by far too few data points to consider a multi level design. My superior told me we should consider correcting the standard error which makes sense due to the nested data and violation of normal distribution at the individual level. Nonetheless it also makes sense that the estimation of clustered standard errors won't be reliable with too few data points at class level...

At least now I know how to correct the standard errors even if it is not apprropiate.

Anyways, thank you!

David

Jeremy Miles

unread,

Feb 16, 2022, 1:14:02 PM2/16/22

to lav...@googlegroups.com

Yeah, you have the same issue.

https://academic.oup.com/ije/article/47/1/321/4091562 says "The minimum number of clusters required to maintain the type I error rate at 5% has been suggested to be around 30–40 clusters for mixed models and 40–50 for GEEs,1,9 although depending on specific trial characteristics, a larger number of clusters may be required."

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/c793476b-c1d4-4552-9518-5daec2d47e35n%40googlegroups.com.

Stas Kolenikov

unread,

Feb 16, 2022, 1:21:00 PM2/16/22

to lav...@googlegroups.com

Economists have been simulating these number of clusters issues like their life depended on that. With symmetric, generally balanced data, about 42 is a good answer. If you have small clusters with larger variances within those clusters, basically your d.f. is the number of these odd clusters, not the total number of clusters.

-- Stas Kolenikov, PhD, PStat (ASA, SSC) @StatStas

-- Principal Statistician, NORC @NORCnews
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CAMtGSxki-dAcn12BOpdd7W3gbiF%2BDwSF2KPupGNT%3DUMRKxxXpA%40mail.gmail.com.

Christian Arnold

unread,

Feb 16, 2022, 1:44:34 PM2/16/22

to lav...@googlegroups.com

Hi Stas,

Yes, 42 is the ultimate answer to everything: https://en.m.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy

Just kidding ...

Von: lav...@googlegroups.com <lav...@googlegroups.com> im Auftrag von Stas Kolenikov <skol...@gmail.com>
Gesendet: Mittwoch, 16. Februar 2022, 19:21
An: lav...@googlegroups.com
Betreff: Re: clustered standard error

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/CACHJnQD3xdFQphuqBAuZZq1j-r%2BsBE6iP2Mso7-_CgF2wh4HcA%40mail.gmail.com.

David

unread,

Feb 17, 2022, 11:24:22 AM2/17/22

to lavaan

haha thank you guys!

Stas Kolenikov

unread,

Feb 17, 2022, 3:53:20 PM2/17/22

to lav...@googlegroups.com

There's an econometrics textbook where that doctum is used to justify the "reliable" number of clusters. It is called, as you can easily guess, "Mostly Harmless Econometrics". (Actually it is a terrific book, and one of the authors got a Nobel prize this year for it. They introduce deep, serious concepts of causal inference without any matrix algebra or conditional probabilities. This is a good reading for any learner of SEM.)

-- Stas Kolenikov, PhD, PStat (ASA, SSC) @StatStas

-- Principal Statistician, NORC @NORCnews
-- Opinions stated in this email are mine only, and do not reflect the position of my employer
-- http://stas.kolenikov.name

To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/46db6e9b-5607-4918-95fa-ebf4d1b43445n%40googlegroups.com.

miao

unread,

Apr 18, 2023, 1:18:03 PM4/18/23

to lavaan

Hello Jeremy

I am testing a model with nested data (128 students from 10 schools), my cluster number is too small to correct clustered standard errors, thus, I referred to this discussion. I would like to confirm if I can include school as a fixed effect in my path model, does it mean to include school as a (uninterested) predictor, just like students' age (covariate)? I look forward to your reply. Thanks a lot.

Best

Miao

Terrence Jorgensen

unread,

May 12, 2023, 8:11:43 AM5/12/23

to lavaan

I would like to confirm if I can include school as a fixed effect in my path model, does it mean to include school as a (uninterested) predictor, just like students' age (covariate)?

Yes, you can include 9 dummy codes to partial out differences in 10 school intercepts.

Terrence D. Jorgensen (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen

Reply all

Reply to author

Forward