average min and max probabilities from get_fit do not agree with those calculated from get

4da...@gmail.com

unread,

Mar 16, 2020, 1:16:27 PM3/16/20

to tidyLPA

Hello,

I noticed that when I compare get_fit() generated prob_min and prob_max to the average probabilities calculated for most likely class membership using data from get_data() they do not agree

any idea why that would be so?

(checked also n_min and n_max values from both sources, which are in agreement )

PS

to obtain prob_min and prob_max from get_data() result I calculated mean CPROB1 for Class 1 and so on for other classes

to obtain n_min and n_max I calculated number of Class 1 members and divided by total sample N and so on for other classes

Caspar van Lissa

unread,

Mar 17, 2020, 3:13:36 PM3/17/20

to tidyLPA

These are not the same thing. See the vignette to understand what these are. https://cran.r-project.org/web/packages/tidyLPA/vignettes/Introduction_to_tidyLPA.html or the Mplus technical appendix 8.

4da...@gmail.com

unread,

Mar 17, 2020, 3:23:36 PM3/17/20

to tidyLPA

I checked this out, but maybe missed something - the definitions in the link you gave are:

Prob. Min.: Minimum of the diagonal of the average latent class probabilities for most likely class membership, by assigned class. The minimum should be as high as possible, reflecting greater classification certainty (cases are assigned to classes they have a high probability of belonging to; see Jung & Wickrama, 2008).
Prob. Max.: Maximum of the diagonal of the average latent class probabilities for most likely class membership, by assigned class. The maximum should also be as high as possible, reflecting greater classification certainty (cases are assigned to classes they have a high probability of belonging to).

I understood it so that selecting all individual records that have the highest probability of Class 1 (so they have 1 in Class column from get_data) and calculating mean of that probability (CPROB1) for these records only will give me average latent class 1 probability - doing so for Class 2 and 3 (let's say there are 3 classes altogether) and then checking the highest and lowest probability would give me the same result as Prob. Max. and Prob. Min. respectively.

It is quite possible I misunderstood something here yet I do not know what...

Caspar van Lissa

unread,

Mar 17, 2020, 3:33:33 PM3/17/20

to tidyLPA

I really appreciate you taking the time to test the implementation, but I would recommend first checking out the formulas and the source code. The function is:


classification_probs_mostlikely <- function(post_prob, class){
    if(is.null(dim(post_prob))) return(1)
    avg_probs <- avgprobs_mostlikely(post_prob, class)
    avg_probs[is.na(avg_probs)] <- 0
    C <- dim(post_prob)[2]
    N <- sapply(1:C, function(x) sum(class == x))
    tab <- mapply(function(this_row, this_col){
        (avg_probs[this_row, this_col]*N[this_row])/(sum(avg_probs[ , this_col] * N, na.rm = TRUE))
    }, this_row = rep(1:C, C), this_col = rep(1:C, each = C))

    matrix(tab, C, C, byrow = TRUE)
}

The source code is directly accessible in R, and online here https://github.com/cjvanlissa/tidyLPA/blob/master/R/calc_functions.R

4da...@gmail.com

unread,

Mar 17, 2020, 3:46:58 PM3/17/20

to tidyLPA

Thanks, I will look into that, I am still a beginner in R so would not even know how to find this.

classification_probs_mostlikely <- function(post_prob, class){
	if(is.null(dim(post_prob))) return(1)
	avg_probs <- avgprobs_mostlikely(post_prob, class)
	avg_probs[is.na(avg_probs)] <- 0
	C <- dim(post_prob)[2]
	N <- sapply(1:C, function(x) sum(class == x))
	tab <- mapply(function(this_row, this_col){
	(avg_probs[this_row, this_col]N[this_row])/(sum(avg_probs[ , this_col] N, na.rm = TRUE))
	}, this_row = rep(1:C, C), this_col = rep(1:C, each = C))

	matrix(tab, C, C, byrow = TRUE)
	}

average min and max probabilities from get_fit do not agree with those calculated from get_data, why?

4da...@gmail.com

Caspar van Lissa

4da...@gmail.com

Caspar van Lissa

4da...@gmail.com