Missing replicates

29 views
Skip to first unread message

Lara Holoidovsky

unread,
Jun 30, 2025, 4:19:00 PMJun 30
to MSstats
Hello MSstats team,

I have a question regarding imputation and the minimal values requred for the imputation.
From what I recall, the decition to impute or not to impute relies on the amount of the features that each protein has in each sample. but what happens to proteins that has been detected in only one replicate out of three? or two replicate out of three? Will be the missing replicate imputed? Is there a command for setting up a minimal missing samples? If not, what is the build in rule for that?

Thank you

Anthony Wu

unread,
Jul 2, 2025, 10:48:45 AMJul 2
to MSstats
Hi,

Great question considering the case when a protein is entirely missing.  Referencing the MSstats V4 paper, "features are not imputed if a protein is entirely missing in a run".   

So in the example you presented, if a protein is only detected in one replicate out of three, MSstats does NOT impute the missing replicates.  

Tony

Lara Holoidovsky

unread,
Aug 28, 2025, 4:58:25 PM (11 days ago) Aug 28
to MSstats
Hi Tony,
thank you very much for the explanation, I read the relevant parts in the paper, thought I understand it but then I saw in my datasets a different behaviour. Let me explain what I understood and show you my data and please correct me if I got something wrong.

On page 1474 of the paper it says "...  Therefore, feature imputation is only possible for feature yijkl in Runijk if there is an observed value for the feature in another run and if there is an observed value from another feature in Runijk. In particular, features are not imputed if the protein is entirely missing in a run."
 The example I am showing have two conditions with 3 bioreplicate each "Treated" rep 1, 2 and 3 and "Control" rep 1,2 and 3. The following table summerize the detected ions in each replicate:
צילום מסך 2025-08-28 133827.png
This is the data for one specific peptide and I basically got the exact same data back after summarization. My question is why, for example ion b6_3_2 that existing in Control_rep_3 wasnt imputed in rep 2 and 1?  It is existing in rep 3, and rep 1 and 2 have evidance (other ions) that this peptide is existing in those runs as well, so it should be imputed, isnt it? Same question about ion b6_3_1 that exists in all three replicated of the Control, and despite the evidance that this peptide existing in the treated condition (rep 1 and 3) the ion wasnt imputed there.
rep 2 of the treated condition wasnt imputed because no ion was detected there at all, i understand that, but the other cases puzzles me, i would appreciate your clarification here.

Attaching the script I am using for data summarization in case you want to see the exact parameters I am using:
MSstatsPTM.summaryLHno_norm = dataSummarizationPTM(msstatsptm_input_data_COMBINED_TESTLH$PTM,
                                            normalization = FALSE,
                                            normalization.PTM = FALSE,
                                            verbose = FALSE,
                                            use_log_file = FALSE,
                                            append = FALSE)

Best,

Anthony Wu

unread,
Aug 29, 2025, 5:56:36 PM (10 days ago) Aug 29
to MSstats
Hi,

Your intuition is correct and those values should be imputed in the way you described.  Would you be able to send me a sample of your dataset and I can look to reproduce your results on my end?  I have concerns there's a bug in the imputation code within MSstatsPTM.

Thanks,
Tony

Anthony Wu

unread,
Sep 4, 2025, 1:08:26 PM (4 days ago) Sep 4
to MSstats
Posting this update after private email discussions:

I've determined that MSstats doesn't impute a feature if it's detected in only one run - code pointer.  This was likely intentional given a feature measured in only one run is more susceptible to noise, which can introduce much more bias after imputation.

On the other hand, b6_3_1 got imputed.  This can be seen with the `newABUNDANCE` and `predicted` columns populated with imputed values within the FeatureLevelData table.

Tony


Reply all
Reply to author
Forward
0 new messages