Is there a sample size limit for tidyLPA

78 views
Skip to first unread message

4da...@gmail.com

unread,
Mar 12, 2020, 7:36:31 AM3/12/20
to tidyLPA
Hello,

when running tidyLPA on a bigger sample (ca. 3500) the results are slightly different each time (e.g. BIC AIC etc.)
when using a sample below 2000 records, the results stay the same (exact same numerical values)

what possibly can cause this in the bigger sample?

Thank you,
Artur

Rosenberg, Joshua

unread,
Mar 14, 2020, 3:13:29 PM3/14/20
to 4da...@gmail.com, tidyLPA

Hi Artur, are you using mclust or mplus? This question addresses a related issue:

 

https://groups.google.com/forum/#!searchin/tidylpa/random$20starts%7Csort:date/tidylpa/TFHUub5HZGw/oMr-XXlEBAAJ

 

--

Joshua M. Rosenberg, Ph.D.
Assistant Professor, STEM Education

The University of Tennessee, Knoxville
Theory and Practice in Teacher Education
420 Claxton Complex, 1122 Volunteer Blvd.

Homepage: https://joshuamrosenberg.com
Research Group: https://makingdatasciencecount.com

He/Him/His

 

From: <tid...@googlegroups.com> on behalf of "4da...@gmail.com" <4da...@gmail.com>
Date: Thursday, March 12, 2020 at 7:36 AM
To: tidyLPA <tid...@googlegroups.com>
Subject: Is there a sample size limit for tidyLPA

 

[External Email]

--
You received this message because you are subscribed to the Google Groups "tidyLPA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tidylpa+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tidylpa/d6da9db3-03a0-496e-81f1-098f21986148%40googlegroups.com.

4da...@gmail.com

unread,
Mar 15, 2020, 12:28:47 PM3/15/20
to tidyLPA
Hi Joshua,
I am using no package argument, so I guess it is mclust (I also have no Mplus installed). 
-there are four variables (test subscale scores) and the exact command is:
lpa <- estimate_profiles(db, n_profiles = 1:6, models = c(2,6))  


To unsubscribe from this group and stop receiving emails from it, send an email to tid...@googlegroups.com.

4da...@gmail.com

unread,
Mar 19, 2020, 5:46:36 AM3/19/20
to tidyLPA
I checked whether calculating only one model on the bigger sample (n=3429) would prove stable, but it did not.
Have no clue what may be the reason, maybe some data format issue? 
(I use RStudio to import spss files)

Different results after the same command (e.g. AIC, BIC):

> lpa <- estimate_profiles(db, n_profiles = 4, models = 6)
The 'variances'/'covariances' arguments were ignored in favor of the 'models' argument.
> lpa
tidyLPA analysis using mclust: 

 Model Classes AIC      BIC      Entropy prob_min prob_max n_min n_max BLRT_p
 6     4       76905.15 77414.77 0.54    0.68     0.84     0.16  0.32  0.01  
> lpa <- estimate_profiles(db, n_profiles = 4, models = 6)
The 'variances'/'covariances' arguments were ignored in favor of the 'models' argument.
> lpa
tidyLPA analysis using mclust: 

 Model Classes AIC      BIC      Entropy prob_min prob_max n_min n_max BLRT_p
 6     4       76857.43 77367.05 0.51    0.62     0.84     0.17  0.34  0.01  


regards,
Artur

Caspar van Lissa

unread,
Mar 19, 2020, 5:51:44 AM3/19/20
to tidyLPA
Dear Arthur,

I believe this is the same issue as in your other question:

Model 6 is a very complex model, and you're estimating it with a lot of classes. My guess is that the model is not converging. The fit is hardly better than for the simpler models you're estimating, so I'd recommend settling for one of those.

4da...@gmail.com

unread,
Mar 19, 2020, 6:55:50 AM3/19/20
to tidyLPA
Ok, so it usually converges on smaller random samples half the size of the whole set (usually, as some cases have no BLRT), but the complete set is so mixed that no clear latent profiles are there and therefore I cannot get reproducible results?
I hope I got this right, thanks for the answer.

Caspar van Lissa

unread,
Mar 19, 2020, 6:57:36 AM3/19/20
to tidyLPA
Almost. With smaller samples, you get a stable solution more often (less chance of running into local optima). But the main problem is not the size of the data, but rather, the complexity of the model.

4da...@gmail.com

unread,
Mar 19, 2020, 9:20:12 AM3/19/20
to tidyLPA
Thanks for that explanation, I appreciate it. 
To provide some further details: Going with the idea of complexity I tried to run twice only model 2 and 3 on the whole sample n3429 (as they are less complex models) and the results were not the same. Running only model 2 and 1:6 profiles, then only model 2 and 1:4 profiles  - still not the same results.  Of course the differences in fit indices are always relatively small.
(I mean except the solution with 1 profile/class for any model where the indices are the same each time). 

Hope I did not misunderstood the problem :)

Artur

Caspar van Lissa

unread,
Mar 19, 2020, 1:45:02 PM3/19/20
to tidyLPA
I still think that it's a local optimum issue. It might be beneficial to run the model in Mplus, and increase the number of random starts to see how many iterations arrive at the same LL. Mclust does not have this option.

4da...@gmail.com

unread,
Mar 19, 2020, 4:35:37 PM3/19/20
to tidyLPA
Thanks, really appreciate your replies.
Unfortunately I do not have access to Mplus. Tried to run trial version once but it did not work with R, or maybe my beginner's ability was not enough.
Reply all
Reply to author
Forward
0 new messages