Re: Mplus LPA/LCA vs. MClust

297 views
Skip to first unread message

Joshua Rosenberg

unread,
Apr 21, 2018, 11:14:55 AM4/21/18
to J.D. Haltigan, tid...@googlegroups.com
J.D., more than happy to talk through this. If you don't mind, I tried to start a mailing list for the package just a few days ago, and I'm cc'ing the list so that others can see this and use it. If that's not okay for any reasons, then I'll delete it.

So... my understanding (and experience) is that mclust can specify models that MPlus cannot *and* that MPlus can specify models that mclust cannot. 

My current understanding about which can be specified in MPlus & mclust as well as which can be specified in MPlus but not mclust is here: https://jrosen48.github.io/tidyLPA/articles/Introduction_to_tidyLPA.html

What this leaves out is your exact question about which can be specified in mclust but not MPlus. Those are some of those listed here: https://www.rdocumentation.org/packages/mclust/versions/5.4/topics/mclustModelNames. I'm not sure which exactly cannot be specified in MPlus; I think we could find out or may even be able to through elimination, but, in short, I think it will be some slightly more esoteric (i.e., with very uncommon covariance matrix structure) models. Part of the challenge of using mclust is that the models are described *geometrically*, rather than in terms of whether and how means, variances, and covariances between the variables are estimated. Sorry this answer isn't super satisfying; I'd be happy to dig into this more soon.

Regarding the two questions:

for TidyLPA am I correct in understanding that it just provides an interface to Mclust? That is, Mclust is what is running to execute the models etc.? If so, and perhaps a silly question, but did you develop TidyLPA to simply leverage the latent profile part of Mclust?

Yes, it just provides an interface to mclust (and to Mplus - which required more work because it has to dynamically generate the mplus syntax. In short, I think there's two decent reasons why tidyLPA does / shoud exist: 1) it's not easy to work with mclust, especially the output, and 2) related to the reason I mentioned above - the mclust models are described geometrically, and it took a a bit of work (including asking stack overflow questions [here!] and going back-and-forth with the developers of mclust, the Muthens, and other colleagues to make sure that the models were correctly corresponded (i.e., to make sure that a model with "Varying means, equal variances, and covariances fixed to 0" corresponded to the mclust model "EEI", which is described in the mclust documentation as "diagonal, equal volume and shape". 

I noticed you had models 1-4 listed that you wanted to compare between the two, but need to look more closely at what model couldn't be parameterized in the other (I want to say it was model 3)...but I am wondering if you might have a quick answer to my question about what specifically made it not able to be parameterized in the other program.

Sorry, I think I (didn't) answer this a bit above. Without diving into understanding the models well and checking the output, I could guess that out of those available in mclust, it would be all of those except  “EEI”, “EEE”, “VVI”, and “VVV”. But I'm not sure.

Look forward to continuing to talk and better understand the affordances of mclust - either in addition to or in replacement of some of the functionality of Mplus. I think both are powerful tools. I hope the package makes it easier to work with them and I hope my response helps a bit.
Josh



On Fri, Apr 20, 2018 at 5:24 PM, J.D. Haltigan <jhal...@gmail.com> wrote:
Hey Josh:

Thanks so much for getting back to me. I will look these over in depth asap. To provide some context, I am a long-time user of Mplus and use it extensively for mixture modeling. Recently, a colleague I am working with on a project proposed Mclust given the range of different models one can explore using Gaussian mixture approaches that don't assume local independence etc. That said, I am not sure what type of model he had in mind using Mclust that couldn't necessarily be parameterized in Mplus, and since I didn't have any background in Mclust, I checked it out and have been playing with it a bit (R is not my native language per se, but have used it rather extensively to do some IRT stuff, comparing runs to Mplus output as you have done with LPA).

To be sure, I am not nearly as conversant in programming as yourself (very impressive!), and am just now considering using MplusAutomation for some things (even though I am still not 100% clear on its functional upside in my case).

Two quick questions:

for TidyLPA am I correct in understanding that it just provides an interface to Mclust? That is, Mclust is what is running to execute the models etc.? If so, and perhaps a silly question, but did you develop TidyLPA to simply leverage the latent profile part of Mclust?

I noticed you had models 1-4 listed that you wanted to compare between the two, but need to look more closely at what model couldn't be parameterized in the other (I want to say it was model 3)...but I am wondering if you might have a quick answer to my question about what specifically made it not able to be parameterized in the other program.

Looking forward to discussing this more since I'll be using both heavily in the next few months!

Best regards,
J.D.

On Fri, Apr 20, 2018 at 9:36 AM, Joshua Rosenberg <jro...@msu.edu> wrote:
J.D., thank you so much for your message. So... I'm glad you asked about this. Here is what should be on the site (but I'm having trouble finding this, too, so need to update it) - the R Markdown doc "comparing-mplus-mclust.Rmd". I think the results are highly similar but I'd like to better understand when and in what cases they diverge; I expect for more complex models and data, that they will. Can you look over these and let me know what you think? At the least, I'd like to consider turning these into a vignette for the tidyLPA package (that would be a better place to store them and know where they are, at least :)). I'd welcome you being added as a contributor to tidyLPA if you would like to work on this! Let me know what you think. 
Josh


On Tue, Apr 17, 2018 at 8:11 PM J.D. Haltigan <jhal...@gmail.com> wrote:
Hi Joshua:

I came across your website and work comparing Mplus mixture models with Mclust. As I am using both myself (or will be on the later, Mclust), I wanted to see if you had done any further work comparing these two and ask a few questions about similarities and differences in how each program environment parameterizes and estimates analogous mixture models.

I also noticed you had the rmarkdown for the Iris dataset comparison on one section of your github page but then I seem to have lost my way in navigating between your blog and the markdown. Any chance you might be able to redirect me to that set of markdown?

Many thanks for any insights!

Best regards,
J.D. Haltigan

*****************************************
J.D. Haltigan, Ph.D.
Assistant Professor
Department of Psychiatry, University of Toronto
Cundill Scholar, The Centre for Addiction and Mental Health (CAMH)
Child and Youth Mental Health Collaborative
The Hospital for Sick Children
The Centre for Addiction and Mental Health

John.H...@camh.ca
416.535.8501 (ext. 39386)

http://www.psychiatry.utoronto.ca/people/dr-john-d-haltigan/
https://www.researchgate.net/profile/John_Haltigan/
--
Joshua Rosenberg, Ph.D. Candidate
Educational Psychology ​&​ Educational Technology
Michigan State University




--
Joshua Rosenberg, Ph.D. Candidate
Educational Psychology ​&​ Educational Technology
Michigan State University

J.D. Haltigan

unread,
Apr 21, 2018, 6:10:56 PM4/21/18
to Joshua Rosenberg, tid...@googlegroups.com
Very helpful, Josh, and happy to have contributed and initial back-n-forth with the list!

FYI: I am at stack overflow here

Yes, it is difficult to transition back and forth between describing models geometrically and more formally. One thing that was quite useful for me to see (as a primary users of Mplus) in the translation of code (that other users may find of value) is that by specifying the within class covariances (which in the LPA strict form are supposed to be locally independent) is that one can see (in Mplus) whether this assumption has been violated. It often is, which then raises the question of whether one might want to consider modeling the data in a less taxonic way (i.e., a variant of what are known as factor mixture models or mixture factor analysis). For most constructs that I investigate (psychopathology), there is pretty good evidence that they are distributed continuously in nature, and trying to carve that nature at its joints has considerable implications for the inferences we make about the data when force cases into artificial groups and then attempt to validate those groups or classes on antecedents and outcomes. Most notably may be issues of misclassification which may or may not have consequences depending on its severity.

This brings me to a crucial meta point which is, in the case of LCGA and LCA/LPA (also known as group-based models in the Nagin tradition), where groups are considered homogeneous (no within class variation) they are seen as useful heuristics rather than as true 'subpopulations' that exist.

Best regards,
J.D.


*****************************************
J.D. Haltigan, Ph.D.
Assistant Professor
Department of Psychiatry, University of Toronto
Cundill Scholar, The Centre for Addiction and Mental Health (CAMH)
Child and Youth Mental Health Collaborative
The Hospital for Sick Children
The Centre for Addiction and Mental Health

John.H...@camh.ca
416.535.8501 (ext. 39386)

http://www.psychiatry.utoronto.ca/people/dr-john-d-haltigan/
https://www.researchgate.net/profile/John_Haltigan/
Reply all
Reply to author
Forward
0 new messages