average min and max probabilities from get_fit do not agree with those calculated from get_data, why?

37 views
Skip to first unread message

4da...@gmail.com

unread,
Mar 16, 2020, 1:16:27 PM3/16/20
to tidyLPA

Hello,

I noticed that when I compare get_fit() generated prob_min and prob_max to the average probabilities calculated for most likely class membership using data from get_data() they do not agree
any idea why that would be so?
(checked also n_min and n_max values from both sources, which are in agreement )

PS
to obtain prob_min and prob_max from get_data() result I calculated mean CPROB1  for  Class 1 and so on for other classes

to obtain n_min and n_max  I calculated number of Class 1 members and divided by total sample N  and so on for other classes

Caspar van Lissa

unread,
Mar 17, 2020, 3:13:36 PM3/17/20
to tidyLPA
These are not the same thing. See the vignette to understand what these are. https://cran.r-project.org/web/packages/tidyLPA/vignettes/Introduction_to_tidyLPA.html or the Mplus technical appendix 8.

4da...@gmail.com

unread,
Mar 17, 2020, 3:23:36 PM3/17/20
to tidyLPA
I checked this out, but maybe missed something - the definitions in the link you gave are:
  • Prob. Min.: Minimum of the diagonal of the average latent class probabilities for most likely class membership, by assigned class. The minimum should be as high as possible, reflecting greater classification certainty (cases are assigned to classes they have a high probability of belonging to; see Jung & Wickrama, 2008).
  • Prob. Max.: Maximum of the diagonal of the average latent class probabilities for most likely class membership, by assigned class. The maximum should also be as high as possible, reflecting greater classification certainty (cases are assigned to classes they have a high probability of belonging to).
I understood it so that selecting all individual records that have the highest probability of Class 1  (so they have 1 in Class column from get_data) and calculating mean of that probability (CPROB1) for these records only will give me average latent class 1 probability - doing so for Class 2 and 3 (let's say there are 3 classes altogether) and then checking the highest and lowest probability would give me the same result as Prob. Max. and Prob. Min. respectively. 
It is quite possible I misunderstood something here  yet I do not know what...

Caspar van Lissa

unread,
Mar 17, 2020, 3:33:33 PM3/17/20
to tidyLPA
I really appreciate you taking the time to test the implementation, but I would recommend first checking out the formulas and the source code. The function is:

classification_probs_mostlikely <- function(post_prob, class){
   if(is.null(dim(post_prob))) return(1)
   avg_probs <- avgprobs_mostlikely(post_prob, class)
   avg_probs[is.na(avg_probs)] <- 0
   C <- dim(post_prob)[2]
   N <- sapply(1:C, function(x) sum(class == x))
   tab <- mapply(function(this_row, this_col){
       (avg_probs[this_row, this_col]*N[this_row])/(sum(avg_probs[ , this_col] * N, na.rm = TRUE))
   }, this_row = rep(1:C, C), this_col = rep(1:C, each = C))

    matrix(tab, C, C, byrow = TRUE)
}

The source code is directly accessible in R, and online here https://github.com/cjvanlissa/tidyLPA/blob/master/R/calc_functions.R


4da...@gmail.com

unread,
Mar 17, 2020, 3:46:58 PM3/17/20
to tidyLPA
Thanks, I will look into that, I am still a beginner in R so would not even know how to find this.
I wanted to put average probabilities in a paper, that is why I asked
Did not want to come across as a fault finder, sorry if I have.
Reply all
Reply to author
Forward
0 new messages