Persistent Batch Effect

40 views
Skip to first unread message

Montoni Bass

unread,
Apr 12, 2025, 9:27:07 PMApr 12
to MSstats
  Hello!
I have ran the MSstats PTM algorithm in my TMT Phospho data, but unfortunately after doing a PCA analysis of the output, it the batch effect persists (attached PDF). I wonder if I did something wrong while applying it. Could you help me with this issue?

This is my code:

library(MSstatsPTM)
library(readxl)


maxq_tmt_evidence <- read.table("data/evidence.txt", sep = "\t", header = TRUE)

maxq_tmt_annotation <- read_excel("data/annotation_file.xlsx")

head(maxq_tmt_evidence)
head(maxq_tmt_annotation)

msstats_format_tmt = MaxQtoMSstatsPTMFormat(evidence=maxq_tmt_evidence,
                                            annotation=maxq_tmt_annotation,
                                            fasta=('data/Musmusculus_uniprotkb_AND_reviewed_true_AND_model_o_2024_06_25.fasta'),
                                            fasta_protein_name="uniprot_ac",
                                            mod_id="\\(Phospho \\(STY\\)\\)",
                                            use_unmod_peptides=TRUE,
                                            labeling_type = "TMT",
                                            which_proteinid_ptm = "Proteins")


head(msstats_format_tmt$PROTEIN)

write.csv(msstats_format_tmt$PROTEIN,
          file = "outputs/msstats_format.csv",
          row.names = FALSE)

write.csv(msstats_format_tmt$PROTEIN,
          file = "outputs/ProteinLevelData_BeforeMSSTats.csv",
          row.names = FALSE)

dataSummarizationPTM(
  data,
  logTrans = 2,
  normalization = "equalizeMedians",
  normalization.PTM = "equalizeMedians",
  nameStandards = NULL,
  nameStandards.PTM = NULL,
  featureSubset = "all",
  featureSubset.PTM = "all",
  remove_uninformative_feature_outlier = FALSE,
  remove_uninformative_feature_outlier.PTM = FALSE,
  min_feature_count = 2,
  min_feature_count.PTM = 1,
  n_top_feature = 3,
  n_top_feature.PTM = 3,
  summaryMethod = "TMP",
  equalFeatureVar = TRUE,
  censoredInt = "NA",
  MBimpute = TRUE,
  MBimpute.PTM = TRUE,
  remove50missing = FALSE,
  fix_missing = NULL,
  maxQuantileforCensored = 0.999,
  use_log_file = TRUE,
  append = TRUE,
  verbose = TRUE,
  log_file_path = NULL,
  base = "MSstatsPTM_log_"
)


# View the first few rows of the summarized data
head(summary_data_tmt, n = 20)

write.csv(summary_data_tmt[["PTM"]][["ProteinLevelData"]],
          file = "outputs/MsStatsAdjustedProteinLevelData_Global_LastTest.csv",
          row.names = FALSE)


# Extract protein-level data from summarized results
protein_data <- summary_data_tmt[["PTM"]][["ProteinLevelData"]]

Also I am displaying the annotation and s subset of the evidence file. 

Thank you for the attention!

Montoni.

annotation_file.xlsx
Batch Effect Removal PCA_Montoni-1.pdf
evidence_subset.txt

Devon Kohler

unread,
Apr 15, 2025, 9:14:02 AMApr 15
to MSstats
Hi Montoni,

Since this is a TMT experiment I would recommend using the dataSummarizationPTM_TMT which will control for the TMT mixtures by using your pooled column. This should fix the batch effects.

Devon

Sam Siljee

unread,
Apr 30, 2025, 10:18:11 PMApr 30
to MSstats
Hi Montoni,

This might not be relevant to you, but I've noticed that including more details in the condition column can improve results (mostly for LFQ data though, so I don't know if it applies to your situation).
For example, the conditions "treated" and "untreated" changed to "treated_male", "treated_female", "untreated_male", and "untreated_female", then adapting the contrast matrix accordingly.

Sam
Reply all
Reply to author
Forward
0 new messages