strange behaviour of median normalization

Christian Schori

unread,

Sep 28, 2021, 8:40:32 AM9/28/21

to MSstats

Hi MSstats Team,

I am currently building our in-house LFQ analysis pipeline around MSstats. During some tests, we've observed a strange behavior of median normalization in some samples (directDIA; exported from Spectronaut). Somehow the median of these samples is pulled down to zero, as shown for "KO2" in the dataProcessPlot below.

This stands in contrast to the normalization plot shown in Spectronaut:

Can you tell me, what's wrong with my dataset that it causes such a strange behavior during median normalisation?

The data causing this strange behavior can be found here.

library(MSstats)

raw <- read.csv("../Desktop/normalization_issue/SN_MSstats_report.csv", sep = ",")
annot <- read.csv("../Desktop/normalization_issue/annotation.csv", sep = ",")

input <- SpectronauttoMSstatsFormat(input = raw,
                                     annotation = annot,
                                     intensity = 'PeakArea',
                                     filter_with_Qvalue = TRUE,
                                     qvalue_cutoff = 0.01,
                                     useUniquePeptide = TRUE,
                                     removeFewMeasurements = TRUE,
                                     removeProtein_with1Feature = FALSE,
                                     summaryforMultipleRows = max)

QuantData <- dataProcess(input,
                         normalization = "equalizeMedians",
                         summaryMethod = "TMP",
                         censoredInt = "0",
                         nameStandards = NULL,
                         MBimpute = TRUE,
                         maxQuantileforCensored = 0.999)

dataProcessPlots(QuantData, type = "QCPlot", which.Protein = "allonly")

Thank you very much for looking into this issue...

Best,

Christian

Mateusz Staniak

unread,

Sep 30, 2021, 4:44:12 AM9/30/21

to MSstats

Hi,

I will check what's going on there.
Are you using the latest version of MSstats from Bioconductor?

Kind regards,

Mateusz

Christian Schori

unread,

Sep 30, 2021, 7:08:15 AM9/30/21

to MSstats

Hi Mateusz

Yes, I am using the latest Version from Bioconductor (MSstats 4.0.1 and R 4.1.1).

Thank you for looking into this issue.

Best,

Christian

Christian Schori

unread,

Oct 5, 2021, 10:45:49 AM10/5/21

to MSstats

Hi Mateusz

While playing around with the Spectronaut Report settings to reduce the file size of the export, I've discovered that this sort of fixed the strange behavior with the normalization reported above.

After removing all columns marked in red from the export, the median normalization seems to work fine again...

[1] "R.Condition"                  "R.FileName"                   "R.Replicate"                  "PG.ProteinAccessions"         "PG.ProteinGroups"             "PG.Cscore"
[7] "PG.Qvalue"                    "PG.RunEvidenceCount"          "PG.Quantity"                  "PEP.GroupingKey"              "PEP.StrippedSequence"         "PEP.Quantity"
[13] "EG.iRTPredicted"              "EG.Library"                   "EG.ModifiedSequence"          "EG.PrecursorId"               "EG.Qvalue"                    "EG.Cscore"
[19] "FG.Charge"                    "FG.Id"                        "FG.PrecMz"                    "FG.Quantity"                  "F.Charge"                     "F.FrgIon"
[25] "F.FrgLossType"                "F.FrgMz"                      "F.FrgNum"                     "F.FrgType"                    "F.ExcludedFromQuantification" "F.NormalizedPeakArea"

[31] "F.NormalizedPeakHeight" "F.PeakArea" "F.PeakHeight"

I didn't change anything on the script, hence I'm currently quite puzzled about this behavior. Do you have any idea, what's going on here?

Thank you for looking into this issue...

Best,

Christian

Mateusz Staniak

unread,

Oct 8, 2021, 5:23:07 AM10/8/21

to MSstats

Hi,

sorry for taking so long,

plotting function does not use information about censoredInt parameter, so 0 are treated as legitimate values. You can try to create a copy of dataProcess, remove 0 from the FeatureLevelData slot and see that the plot looks OK.

We will most likely address that in the upcoming MSstats release

Kind regards,

Mateusz

Christian Schori

unread,

Oct 18, 2021, 10:46:27 AM10/18/21

to MSstats

Hi Mateusz

Thank you for looking into this issue. Unfortunately, I'm not able to reproduce your suggested solution. If I'm removing all Zeros of the feature level output of the dataProcess dataset (ABUNDANCE != 0 or censored == FALSE), the median normalization is still not working...

...but if I perform this filtering on the input data of dataProcess function (Intensity != 0) this seems to fix the problem.

Can you explain to me, why this doesn't work with the first approach? Shouldn't these two filtering approaches produce in the same result if censoredInt = "0"?!

Thank you very much for your explanation and also for the planned fix in the upcoming MSstats release...

Best,

Christian

Mateusz Staniak

unread,

Oct 21, 2021, 4:22:20 PM10/21/21

to MSstats

Hi,

can you please share the data with me again? I will look into this issue again.

Kind regards,

Mateusz Staniak

Christian Schori

unread,

Oct 22, 2021, 4:56:23 PM10/22/21

to MSstats

Hi Mateusz

Thank you very much for looking into this issue once more. Please find the uploaded Spectronaut-Export, Annotation-File and R-Script again:

https://filesender.switch.ch/filesender2/?s=download&token=b6324f1e-ba35-4021-bceb-7cc2ddf8226f