help for TidyLPA use

Emmanuel W

unread,

Jul 13, 2018, 7:26:04 AM7/13/18

to tidyLPA

Hello,

I am post-doctoral researcher in France and I have some questions concerning the use of tidyLPA.

1) When I compute the same analysis a second time (the same model with
the same parameters), I get the impression that the results are not
perfectly identical. Is there a way to remedy this to have perfectly
reproducible results (like a kind of seed...)? Or does that mean that my
model is not adequate?

2) I only get curves for models 1 and 2 (not 3 and 6). Does that
indicate that there is a problem?

3) I get this message when I run some models "Some profiles are
associated with no assignments", usually with the smallest BIC (and the
largest number of profiles). Do I have to consider these models as
irrelevant?

Thank you very much in advance for your precious help.

Timur Ozbilir

unread,

Jul 20, 2018, 3:10:15 PM7/20/18

to tidyLPA

I'm having the same issue. I get different entropy, fit indices and posterior probabilities every time I run the same code.

Have you been able to find a solution for this?

Joshua Rosenberg

unread,

Jul 20, 2018, 10:27:26 PM7/20/18

to tidyLPA

Hi Emmanuel, thank you for this important questions. I'll add a caveat. I've developed this software and have a decent understanding of LPA but I'm not an expert.

1.

You are using the estimate_profiles() function, not estimate_profiles_mplus(), correct? Regarding your question (and Timur's), if you're using estimate_profiles(), then you're really using the mclust package "under the hood". My understanding of how it works is that there are two steps. First, the data is clustered using a hierarchical cluster analysis. Then, the results of that hierarchical cluster analysis are used as the starting values for a maximum likelihood (expectation-maximization, or EM) algorithm. The first, hierarchical step is deterministic. It will be the same every time you run it. The second, expectation-maximization step, isn't, though in practice and (often) in my experience, because the hierarchical clustering provides the start points, you often will get the same results from estimate_profiles(). But not always. So it's not completely out of the ordinary, and it is probably more likely with larger datasets and more variables - there are basically more solutions available for the algorithm to find, though some may be local solutions.

I don't have any easy answers or great ideas for how to address this. One may be to choose the solution with the lower log-likelihood. That may be the simplest and best way to go. Perhaps run the function a number of times (I don't have a great idea for how many - maybe 10). And save/record the log-likelihood each time. Especially if the lowest log-likelihood was replicated, there's a good chance that that is a good solution. Another idea -- do you have access to MPlus? It uses a different procedure I'd be happy to describe and talk/think through with you - no hierarchical clustering. Another option is to consider using the RMixMod package: https://cran.r-project.org/web/packages/Rmixmod/index.html; I've wanted to build some code into tidyLPA to interface to that package but haven't yet.

2. My understanding is that this means none of the models with the other model types (3 and 6) converged. It's not necessarily good nor bad; it happens a lot, in my experience. Choose from among the (simpler) models that converged.

3. Yes, I think you should consider those as irrelevant. The key distinction is that the model did converge - the profile/class with no 'assignments' is still identified /estimated. It's just not associated with any responses for which it is the highest probability. So this is probably not a good solution and as you said can be ignored.

Josh

Reply all

Reply to author

Forward