How does dataProcess handle censored values with censoredInt = "NA" and MBimpute=FALSE ?

26 views
Skip to first unread message

Carlo Zanetti

unread,
Apr 17, 2026, 3:55:09 PM (14 days ago) Apr 17
to MSstats

Hi all,

I'm running MSstats v4.18.1 on a label-free DIA dataset (Spectronaut output) from an ultra-low input laser capture microdissection (LCM) experiment profiling the astrocyte niche around amyloid plaques in a mouse model. I'm comparing three conditions: control, plaque_near, and plaque_far. I have biological replicates with multiple technical replicate runs per biological replicate. Given the ultra-low input nature of LCM, missingness is a particular concern in this dataset.

Before dataProcess, I applied a custom feature-level filter requiring ≥50% observation in at least one condition, and used featureSubset = "highQuality" with remove_uninformative_feature_outlier = TRUE.

My dataProcess function was:

processed_data <- dataProcess( msstats_input_filtered_keratin, normalization = "equalizeMedians", summaryMethod = "TMP", censoredInt = "NA", MBimpute = FALSE, featureSubset = "highQuality", remove_uninformative_feature_outlier = TRUE )

I have two concerns:

1. How does the floor replacement actually work in practice?

The documentation states that censoredInt = "NA" with MBimpute = FALSE replaces censored NAs with the cutoffCensored value (minimum observed intensity per feature). My concern is that if multiple NAs within a condition are replaced with the same floor value, this injects identical data points into the TMP summarisation, which could artificially reduce within-group variance at the protein level. A smaller variance would shrink the standard error in groupComparison, potentially producing overconfident t-statistics and false positives. Is this a valid concern, or does TMP's median-based summarisation handle this robustly?

I initially tried MBimpute = TRUE (table attached), but this produced a far greater number of DAPs than seemed biologically reasonable for this experiment.

MBimpute = FALSE gave more conservative and plausible results (table attached) which is why I opted for this approach. However, I want to make sure the floor replacement isn't introducing a subtler version of the same problem.

As you can see from the table, MSstats picks up far more signficant plaque far vs control signficant proteins compared to limma. When plotting the raw values, MSstats is able to identify these DAPs better than limma, and further analysis has demonstrated that these proteins do seem to be consistently 

2. Why does censoredInt = NULL produce identical results to censoredInt = "NA"?

To test whether the floor replacement was affecting my results, I reran dataProcess with censoredInt = NULL and MBimpute = FALSE, which should treat all NAs as randomly missing with no replacement. The output was identical to censoredInt = "NA".

Initially I assumed this was because my upstream 50% observation filter removed most sparse features, leaving very few NAs. But if that's the case, do featureSubset = "highQuality" and remove_uninformative_feature_outlier = TRUE already handle missingness aggressively enough that the censoredInt setting becomes redundant? Or is there another reason these two settings would produce the same result?

Happy to share more details about the experimental design / my code if useful, and I hope my confusion made sense. 

Thanks!

Carlo

no_imputation.png
imputation.png

Anthony Wu

unread,
Apr 24, 2026, 2:30:00 PM (7 days ago) Apr 24
to MSstats
Hi Carlo,

Thank you for bringing this to our attention.  We did some investigation on our side and concluded that the documentation is outdated.  I will need to double confirm this but wanted to respond asap to give you visibility to the problem.

From my initial investigation of the code, if MBimpute = FALSE, all missing values are treated as missing at random (MAR), and no floor replacement occurs at all.   This also explains why   censoredInt = NULL produce identical results to censoredInt = "NA", because in the case of MBimpute = FALSE, missing values are treated as MAR and dropped.

Tony

Reply all
Reply to author
Forward
0 new messages