Path analysis with a count variable as outcome measure

1,746 views
Skip to first unread message

Jérôme Gijselaers

unread,
Jan 12, 2015, 10:36:39 AM1/12/15
to lav...@googlegroups.com
Hi all,

Let me start by noting that I am new to using the lavaan package in R. I have some experience with R, although I still feel it is quite hard to work with it. But hey, this is nothing new ;-)

For my PhD project I want to execute a path analysis (thus with only manifest variables) using SEM methodology. All my predictor variables are continuous variables, except one, which is ordinal (educational level measured on a 8-level scale). My outcome measure is a count variable (i.e.: it measures study progress: the number of completed modules in one year) which has a range from 0 to 20 with each possible integer in between. My sample contains around 1000 cases and I'm using a cross-validation technique to explore and develop a model and then to confirm this developed model. Thus I end up with two data sets (the randomly split total sample) containing around 500 cases each (a testing sample: for model development; and a validation sample: for model testing).

Now, I already did a full SEM analyses with other data in AMOS when I was abroad. There, a professor told me that I should not analyze data on a count variable in AMOS. She told me I should do this in R because it is more robust for count data.

Is this true?

And, if so, what is the best way to analyze my data in terms of estimation method? To complete: the data on the count variable shows a negative binomial distribution and is thus not normal distributed.

Best, Jérôme

yrosseel

unread,
Jan 13, 2015, 4:50:02 AM1/13/15
to lav...@googlegroups.com
On 01/12/2015 04:36 PM, Jérôme Gijselaers wrote:
> She told me I should do this in R because it
> is more robust for count data.

Hm. lavaan (0.5-17) currently does *not* handle count data. At best, you
can treat it as non-normal continuous data, and use estimator = "MLR".

I am not sure if (in R) we have a SEM package that can handle count data
explicitly at the moment. Outside R, Mplus can certainly do it.

Yves.


Jérôme Gijselaers

unread,
Jan 13, 2015, 10:45:33 AM1/13/15
to lav...@googlegroups.com
Thank you Yves, for your quick and clear response.

How acceptable or bad is it, if I treat it as non-normal continuous data? Because if I do this, I could also analyze it in AMOS with ADF (WLS) estimation, is that correct?

Best, Jérôme

yrosseel

unread,
Jan 13, 2015, 11:03:15 AM1/13/15
to lav...@googlegroups.com
On 01/13/2015 04:45 PM, Jérôme Gijselaers wrote:
> How acceptable or bad is it, if I treat it as non-normal continuous
> data?

I am not sure. The higher the mean value of the count variable, the
better the 'poisson' distribution approximates the normal.

To be sure, you may want to run a small simulation to see how much this
'misspecification' influences the results.

Because if I do this, I could also analyze it in AMOS with ADF
> (WLS) estimation, is that correct?

If you have a large sample, yes. For smaller/medium samples, you are
better off with MLR.

Yves.

Jérôme Gijselaers

unread,
Jan 13, 2015, 11:11:05 AM1/13/15
to lav...@googlegroups.com
Alright, thanks once more for thinking with me!

Jérôme

Jarrett Byrnes

unread,
Jan 13, 2015, 11:25:09 AM1/13/15
to lav...@googlegroups.com
While lavaan and sem both don’t handle count data, if you’re interested in going with the graph theoretic approach to SEM, you can use the piecewiseSEM package at https://github.com/jslefche/piecewiseSEM

In brief, it takes in a list of linear, generalized linear, mixed, or other types of models, fits them, and estimate’s Fisher’s C (see Bill Shipley’s excellent papers on this) to assess whole model fit.

You can freely incorporate glms with Poisson error for count data or otherwise.

It’s under heavy development by it’s author, and he is very eager for feedback, but so far it’s working well. I’ve included some examples in my course notes and code at http://byrneslab.net/teaching/sem/ from Day 2 and Day 4.

-Jarrett

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Solomiia Myroniuk

unread,
May 16, 2023, 5:41:16 AM5/16/23
to lavaan
Hi Yves, 

Is there new developments since 2015 for the count data in lavaan? 
Thanks. 

Solomiia

Terrence Jorgensen

unread,
May 16, 2023, 5:46:23 AM5/16/23
to lavaan
No, and there most likely won't be.  There was a brief window when a very talented programmer was working on a C++ port to speed up the MML algorithm (which is forbiddingly slow in R), but they found employment in the private sector.  Unless another very talented programmer decides to take up that project, MML is not likely to be anywhere near the top of the priory list for lavaan development.

Terrence D. Jorgensen    (he, him, his)
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
http://www.uva.nl/profile/t.d.jorgensen


Reply all
Reply to author
Forward
0 new messages