Missing replicates

Lara Holoidovsky

unread,

Jun 30, 2025, 4:19:00 PMJun 30

to MSstats

Hello MSstats team,

I have a question regarding imputation and the minimal values requred for the imputation.

From what I recall, the decition to impute or not to impute relies on the amount of the features that each protein has in each sample. but what happens to proteins that has been detected in only one replicate out of three? or two replicate out of three? Will be the missing replicate imputed? Is there a command for setting up a minimal missing samples? If not, what is the build in rule for that?

Thank you

Anthony Wu

unread,

Jul 2, 2025, 10:48:45 AMJul 2

to MSstats

Hi,

Great question considering the case when a protein is entirely missing. Referencing the MSstats V4 paper, "features are not imputed if a protein is entirely missing in a run".

So in the example you presented, if a protein is only detected in one replicate out of three, MSstats does NOT impute the missing replicates.

Tony

Lara Holoidovsky

unread,

Aug 28, 2025, 4:58:25 PMAug 28

to MSstats

Hi Tony,

thank you very much for the explanation, I read the relevant parts in the paper, thought I understand it but then I saw in my datasets a different behaviour. Let me explain what I understood and show you my data and please correct me if I got something wrong.

On page 1474 of the paper it says "... Therefore, feature imputation is only possible for feature yijkl in Runijk if there is an observed value for the feature in another run and if there is an observed value from another feature in Runijk. In particular, features are not imputed if the protein is entirely missing in a run."

The example I am showing have two conditions with 3 bioreplicate each "Treated" rep 1, 2 and 3 and "Control" rep 1,2 and 3. The following table summerize the detected ions in each replicate:

This is the data for one specific peptide and I basically got the exact same data back after summarization. My question is why, for example ion b6_3_2 that existing in Control_rep_3 wasnt imputed in rep 2 and 1? It is existing in rep 3, and rep 1 and 2 have evidance (other ions) that this peptide is existing in those runs as well, so it should be imputed, isnt it? Same question about ion b6_3_1 that exists in all three replicated of the Control, and despite the evidance that this peptide existing in the treated condition (rep 1 and 3) the ion wasnt imputed there.

rep 2 of the treated condition wasnt imputed because no ion was detected there at all, i understand that, but the other cases puzzles me, i would appreciate your clarification here.

Attaching the script I am using for data summarization in case you want to see the exact parameters I am using:

MSstatsPTM.summaryLHno_norm = dataSummarizationPTM(msstatsptm_input_data_COMBINED_TESTLH$PTM,
normalization = FALSE,
normalization.PTM = FALSE,
verbose = FALSE,
use_log_file = FALSE,
append = FALSE)

Best,

Anthony Wu

unread,

Aug 29, 2025, 5:56:36 PMAug 29

to MSstats

Hi,

Your intuition is correct and those values should be imputed in the way you described. Would you be able to send me a sample of your dataset and I can look to reproduce your results on my end? I have concerns there's a bug in the imputation code within MSstatsPTM.

Thanks,

Tony

Anthony Wu

unread,

Sep 4, 2025, 1:08:26 PMSep 4

to MSstats

Posting this update after private email discussions:

I've determined that MSstats doesn't impute a feature if it's detected in only one run - code pointer. This was likely intentional given a feature measured in only one run is more susceptible to noise, which can introduce much more bias after imputation.

On the other hand, b6_3_1 got imputed. This can be seen with the `newABUNDANCE` and `predicted` columns populated with imputed values within the FeatureLevelData table.

Tony

Reply all

Reply to author

Forward