MissingPercentage is not correct and MSstat did not remove 50% missing?

Thao Nguyen

unread,

Dec 6, 2023, 11:11:49 AM12/6/23

to MSstats

Dear MSstat team,

I've encountered an issue with MSStat. Although the MSStat run was successful, I've noticed discrepancies when exporting ProteinQuant (wide format) and comparing it with the results from MSStat. Upon manual calculation, I observed that the fold change values do not consistently align with the results generated by MSStat. Notably, MSStat reports fold changes and p-values for some proteins that are missing more than 50% of data in each group. To illustrate this issue, here's an example. Please note that I encounter this issue (missing more than 50% of data in each group) across multiple groups, but not in the case of pairwise comparisons with only two conditions. However, differences in ratios between MSstat and manual calculations were observed in both scenarios.

I would appreciate your assistance in resolving this matter.

Best regards,

Thao

QuantData <-dataProcess(
raw = converted_to_MSStat_human_kidney,
logTrans = 2,
normalization = "equalizeMedians",
nameStandards = NULL,
featureSubset = "top3",
n_top_feature = 3,
summaryMethod = "TMP",
censoredInt = "NA",
MBimpute = FALSE,
remove50missing = TRUE, ######I set remove 50% missing here but it did not remove the data with >50% missing in each group?##################
fix_missing = NULL,
maxQuantileforCensored = 0.999,
use_log_file = TRUE,
append = FALSE,
verbose = TRUE,
log_file_path = NULL
)
names(QuantData)

#######################################Convert protein quant to wide format:

library(tidyverse)
human_kidney_MSStat_Protein_Quant_Report<- QuantData$ProteinLevelData %>%
select(Protein, originalRUN, LogIntensities) %>%
pivot_wider(names_from = originalRUN, values_from = LogIntensities)

##SAVE PROTEIN QUANT WIDE FORMAT:
write.csv(human_kidney_MSStat_Protein_Quant_Report, "Human_kidney_MSStat_Protein_Quant_Report_No_impute.csv")

Mateusz Staniak

unread,

Dec 8, 2023, 5:02:56 AM12/8/23

to MSstats

Hi,

thanks for reporting the issue, I'll see if I can reproduce it with any of datasets that I can access.

Kind regards,

Mateusz

Mateusz Staniak

unread,

Jan 26, 2024, 12:41:26 PM1/26/24

to MSstats

Hi,

apologies it took so long, I think I found the issue, I will update the package very soon. Please note that this option will remove runs with more than 50% missing observations, so influence on groups will vary based on experimental design [how runs are related to conditions].

Kind regards,
Mateusz

Scott Lyons

unread,

Jan 27, 2024, 11:23:04 AM1/27/24

to MSstats

Hello Mateusz,

I was just going to post about experiencing the same issue. I'm glad you think you resolved it. Could you let's us know when it's updated and which branch from GitHub to install it from.

Best,

Scott

Priya Rajarapu

unread,

May 25, 2024, 11:28:28 PM5/25/24

to MSstats

Hello!

I have the same problem in v.12. Could you please update us on the solution for this problem? Also, would you suggest using an older version of MSstats that calculates the correct fold changes?

Thanks!

Priya

Mateusz Staniak

unread,

May 31, 2024, 2:57:30 PM5/31/24

to MSstats

Hi,

fold-changes are calculated correctly, but this filter was no properly document. Latest GitHub version has updated documentation.

When set to TRUE, it removes the proteins where every run has at least 50\% missing values for each peptide. This is a separate calculation from the MissingPercentage column in the output.

In the future we will probably remove it. Please let us know if you have other questions or concerns related to this parameter or dataProcess/groupComparison output interpretation.