MSstatsTMT and Batch Effects

358 views
Skip to first unread message

Emily King

unread,
Jul 21, 2021, 12:43:27 PM7/21/21
to MSstats

Hi Support Team,

I'm currently running an analysis in MSstatsTMT on a dataset that has batch effects. It’s a TMTpro dataset with four treatment groups and four replicates where the replicates were prepared in batches (i.e. the 4 samples for replicate 1 on day 1, 4 samples for replicate 2 on day 2, etc). We see batch effects between samples prepared on different days. Is there a way to specify this sort of batch effect in the annotation file? I have tried running the data through MSstats as technical or biological replicates but it's hard to tell if the effects have been corrected for. I have been evaluating this by creating a PCA plot (attached) after the proteinSummarization step or by generating ProfilePlots but I suspect batch effects might be taken into consideration in the groupComparison step.

I was wondering, does MSstats compensate for batch effects? If it does but in the groupComparison step, how would you recommend evaluating this?

Thanks,

Emily

annotation.txt
whole_cell-PCA.png

thuan...@gmail.com

unread,
Jul 26, 2021, 12:12:36 PM7/26/21
to MSstats
Hi Emily,

Did you set global_norm = TRUE when running proteinSummarization() function? This normalization is supposed to remove the technical bias between channels.

Best,
Ting

Emily King

unread,
Jul 27, 2021, 11:03:50 AM7/27/21
to MSstats
Hi Ting,

Yes I did set global_norm = TRUE. Here are my function calls:

  global.raw <- MaxQtoMSstatsTMTFormat(evidence = evidence,
                                                                             proteinGroups = proteinGroups,
                                                                             annotation = annotation,
                                                                             useUniquePeptide = FALSE)

  global.quant <- proteinSummarization(global.raw,
                                                                       global_norm = TRUE,
                                                                       reference_norm = FALSE)

  global.comp <- groupComparisonTMT(global.quant,
                                                                       moderated = TRUE,
                                                                       save_fitted_models = TRUE)


The global_norm does seem to normalize all channels to the same mean intensity (QCPlot attached) but we still see the first replicate have batch effects like those in the attached ProfilePlot. Can MSstats also normalize between replicates so that the batch effects are minimized?

Thanks,
Emily
ProfilePlot.pdf
QCPlot.pdf

thuan...@gmail.com

unread,
Jul 27, 2021, 1:03:23 PM7/27/21
to MSstats
Hi Emily,

Since global normalization is already applied, another local normalization for each protein separately seems required to remove the batch effects between days. Unfortunately, MSstats only provides such an option to remove batch effects between mixtures using reference channels, not within the mixture. 

My recommendation is 
(1) Use Day information as BioReplicate in the annotation file.
(2) Treat Day as Mixture. Then groupComparisonTMT() will take into the Mixture (Day) effect in the statistical modeling. 

global.raw <- MaxQtoMSstatsTMTFormat(evidence = evidence,
                                     proteinGroups = proteinGroups,
                                     annotation = annotation, # BioReplicate column stores the day information
                                     useUniquePeptide = FALSE)

global.quant <- proteinSummarization(global.raw,
                                     global_norm = TRUE,
                                     reference_norm = FALSE)

# Use day as Mixture
temp <- global.quant$ProteinLevelData
temp$Mixture <- temp$BioReplicate. # BioReplicate stores the day information
global.quant[["ProteinLevelData"]] <- temp

global.comp <- groupComparisonTMT(global.quant,
                                  moderated = TRUE,
                                  save_fitted_models = TRUE)

Best,
Ting

Emily King

unread,
Jul 28, 2021, 10:22:22 AM7/28/21
to MSstats
Hi Ting,

Thank you so much for the recommendation. This has definitely impacted the results of the data. 

I currently double check the batch effect correction by doing PCA or making ProfilePlots after the proteinSummarization step. Since these require the protein abundances to graph and the groupComparisonTMT outputs log fold change, how would you recommend analyzing the impact of batch effect at this level? Is there a correction factor per channel or something similar in the "FittedModel" output of  groupComparisonTMT that might work?

Thanks,
Emily

thuan...@gmail.com

unread,
Aug 7, 2021, 12:09:19 AM8/7/21
to MSstats
Hi Emily,

As I mentioned in the previous email, you can specify a linear model to take into account the batch effect:

(1) Use Day information as BioReplicate in the annotation file.
(2) Treat Day as Mixture. Then groupComparisonTMT() will take into the Mixture (Day) effect in the statistical modeling. 

Then  MSstats generates the testing results with the consideration of Day batch effect. Do you also want to normalize the protein abundance to remove the batch effects?

Best,
Ting
Reply all
Reply to author
Forward
0 new messages