Confused with the censoring and imputation options in MSStats

45 views
Skip to first unread message

Debojyoti Pal

unread,
Jun 17, 2025, 8:29:49 AMJun 17
to MSstats
Dear authors,

I am a relative newbie in proteomics, and struggling to understand the censoring and imputation options in MSStats. I was hoping to get some clarity here. I hope this would also help others like me.

MBimpute : only for summaryMethod="TMP" and censoredInt='NA' or '0'. TRUE (default) imputes ‘NA’ or ‘0’ (depending on censoredInt option) by Accelated failure model. FALSE uses the values assigned by cutoffCensored.

Q1) Does this mean MBimpute option is irrelevant if I use "linear" as summary method?

Q2) How do I turn ON and turn OFF imputation when using "linear" summary method?

censoredInt : Missing values are censored or at random. 'NA' (default) assumes that all ‘NA’s in ‘Intensity’ column are censored. '0' uses zero intensities as censored intensity. In this case, NA intensities are missing at random. The output from Skyline and Progenesis should use ‘0’. Null assumes that all NA intensites are randomly missing.

Q1) Is this option relevant if imputation is turned off? As asked above, can I turn off imputation in "linear" summary method?

Q2) What should be my input be if I assume that missing values are at random? I am using DIANN output (not from Fragpipe). 

cutoffCensored : Cutoff value for censoring. Only with censoredInt='NA' or '0'. Default is 'minFeature', which uses minimum value for each feature. 'minFeatureNRun' uses the smallest between minimum value of corresponding feature and minimum value of corresponding run. 'minRun' uses minumum value for each run.

Q1) Why do we need a cutoff value if the censoredInt is set? I understand this might be a very basic concept, however, I am struggling to understand. A simple explanation of what is happening with cutoffCensored and censoredInt would really be appreciated. I am unable to grasp the idea.

Thank you very much


Anthony Wu

unread,
Jun 19, 2025, 11:51:33 AMJun 19
to MSstats
Hi,

Regarding imputation with linear summarization:

We recommend using the TMP method for summarization rather than linear method since TMP is more robust to outliers, so I wouldn't worry about imputation w.r.t. the linear method.

Regarding censoredInt with missing at random:

If you assume values are missing at random, then you can turn off imputation because if values are missing at random, then even if you had those missing values, adding them back wouldn’t change your overall conclusions w.r.t. the reported fold changes. So, there’s no need to fill them in with imputation—they won’t bias the results by being left out.  

Following up on censoredInt, if you turn off imputation (MBImpute = FALSE), censoredInt is not relevant.  But if you assume missing not at random (i.e. for reasons of low abundance), then for DIANN, you set censoredInt = 'NA' and MBImpute = TRUE.  

Regarding cutoffCensored (now maxQuantileForCensored in newer MSstats versions):

In mass spectrometry data, very low-intensity signals are often unreliable—sometimes just noise. Different software tools for analyzing spectra handle these low signals differently, which can make comparisons inconsistent.  To improve consistency between tools, MSstats v4.0 automatically learns a threshold for what it considers “trustworthy” intensity values. This threshold is chosen separately for each experiment and each tool, based on the distribution of all detected intensities.

Specifically, MSstats defines low-intensity values as those below a certain cutoff, and treats them as missing because they’re likely too low to be reliable. This threshold is not fixed—it’s calculated using percentiles of the intensity values. So if you set maxQuantileForCensored = 0.999, then the formula it uses is:

threshold = 25th percentile - (99.9th percentile - 75th percentile)

This formula essentially finds a point well below the typical dynamic range, helping MSstats filter out noisy low-end values that different tools might otherwise handle inconsistently.


Hope this helps

Tony

Debojyoti Pal

unread,
Jun 20, 2025, 6:38:43 AMJun 20
to MSstats
Thank you for the clarification. If I may ask a follow up question, your earlier doumentations show two distinct parameters: cutoffCensored and maxQuantileforCensored. I understand the maxQuantileforCensored function. It basically classifies very low values as censored. However, the earlier cutoffCensored definition was "Cutoff value for censoring. Only with censoredInt='NA' or '0'. Default is 'minFeature', which uses minimum value for each feature. 'minFeatureNRun' uses the smallest between minimum value of corresponding feature and minimum value of corresponding run. 'minRun' uses minumum value for each run." What happened to this function? Or does it perform the same function as maxQuantileforCensored?

Secondly, how are the imputed values determined? I know its not reasonable to ask for an explanation here, but please point me towards relevant literature if possible. I was under the mistaken assumption that cutoffCensored somehow determined the upper range of imputed values.

Lastly, do I need to use fix_missing command? I am converting from DIANN to MSStats using the DIANNtoMSStats function in MSStats (which I think is an earlier function that is likely to get depreciated)

Thank you for the guidance on this matter.

Debojyoti

Anthony Wu

unread,
Jun 23, 2025, 1:42:02 PMJun 23
to MSstats
Hi,

The cutoffCensored parameter is deprecated.  I would ignore documentation discussing cutoffCensored and focus on the maxQuantileForCensored parameter when thinking about missing values and imputation.

Imputed values are determined by the accelerated failure time model, which is described in this paper section `Missing Value Imputation`

In the DIANNtoMSStatsFormat function, the fix_missing parameter is deprecated - you don't need to worry about this.

Thanks,
Tony

Debojyoti Pal

unread,
Jun 24, 2025, 7:55:41 AMJun 24
to MSstats
Thanks for the clarifications!
Reply all
Reply to author
Forward
0 new messages