Dear MSstats team,
I'm using MSstats (
MSstats_4.14.2) to analyze differentially abundant proteins between multiple conditions.
While trying to understand how MissingPercentage in the MSstats::groupComparison output is calculated, I noticed, that the values in the column do not overlap with the values I get, when I run each function (and .sub-function) manually and in pairwise manner.
I run groupComparison for a 3x3 comparison-matrix with GROUPS= h_comp, s_comp, w_comp.
Levels of the processedData$ProteinLevelData$GROUP are then in alphabetical order:
h_comp, s_comp, w_comp, by which I fill in the comparison matrix.
However, in the function .countMissingPercentage (within MSstats::groupComparison), a table named "count" is defined:
counts = summarized[, list(totalN = unique(TotalGroupMeasurements),
NumMeasuredFeature = sum(NumMeasuredFeature, na.rm = TRUE),
NumImputedFeature = sum(NumImputedFeature, na.rm = TRUE)),
by = "GROUP"]
The row names in "count" are then ordered "w_comp, s_comp, h_comp", resulting in the wrong choice of rows in the for-loop below, when you execute "
conditions = contrast_matrix[i, ] != 0".
The short term solution is to only conduct comparisons with 2x2 comparison-matrices, like that the mixing up of rows in the for-loop should not result in wrong MissingPercentages.
I tried with different comparisons and figured this might be a bug in MSstats. Please let me know if you cannot reproduce these wrong MissingPercentages with data of your choice.
Cheers,
Selim