Persistent Batch Effect

50 views
Skip to first unread message

Montoni Bass

unread,
Apr 12, 2025, 9:27:07 PM4/12/25
to MSstats
  Hello!
I have ran the MSstats PTM algorithm in my TMT Phospho data, but unfortunately after doing a PCA analysis of the output, it the batch effect persists (attached PDF). I wonder if I did something wrong while applying it. Could you help me with this issue?

This is my code:

library(MSstatsPTM)
library(readxl)


maxq_tmt_evidence <- read.table("data/evidence.txt", sep = "\t", header = TRUE)

maxq_tmt_annotation <- read_excel("data/annotation_file.xlsx")

head(maxq_tmt_evidence)
head(maxq_tmt_annotation)

msstats_format_tmt = MaxQtoMSstatsPTMFormat(evidence=maxq_tmt_evidence,
                                            annotation=maxq_tmt_annotation,
                                            fasta=('data/Musmusculus_uniprotkb_AND_reviewed_true_AND_model_o_2024_06_25.fasta'),
                                            fasta_protein_name="uniprot_ac",
                                            mod_id="\\(Phospho \\(STY\\)\\)",
                                            use_unmod_peptides=TRUE,
                                            labeling_type = "TMT",
                                            which_proteinid_ptm = "Proteins")


head(msstats_format_tmt$PROTEIN)

write.csv(msstats_format_tmt$PROTEIN,
          file = "outputs/msstats_format.csv",
          row.names = FALSE)

write.csv(msstats_format_tmt$PROTEIN,
          file = "outputs/ProteinLevelData_BeforeMSSTats.csv",
          row.names = FALSE)

dataSummarizationPTM(
  data,
  logTrans = 2,
  normalization = "equalizeMedians",
  normalization.PTM = "equalizeMedians",
  nameStandards = NULL,
  nameStandards.PTM = NULL,
  featureSubset = "all",
  featureSubset.PTM = "all",
  remove_uninformative_feature_outlier = FALSE,
  remove_uninformative_feature_outlier.PTM = FALSE,
  min_feature_count = 2,
  min_feature_count.PTM = 1,
  n_top_feature = 3,
  n_top_feature.PTM = 3,
  summaryMethod = "TMP",
  equalFeatureVar = TRUE,
  censoredInt = "NA",
  MBimpute = TRUE,
  MBimpute.PTM = TRUE,
  remove50missing = FALSE,
  fix_missing = NULL,
  maxQuantileforCensored = 0.999,
  use_log_file = TRUE,
  append = TRUE,
  verbose = TRUE,
  log_file_path = NULL,
  base = "MSstatsPTM_log_"
)


# View the first few rows of the summarized data
head(summary_data_tmt, n = 20)

write.csv(summary_data_tmt[["PTM"]][["ProteinLevelData"]],
          file = "outputs/MsStatsAdjustedProteinLevelData_Global_LastTest.csv",
          row.names = FALSE)


# Extract protein-level data from summarized results
protein_data <- summary_data_tmt[["PTM"]][["ProteinLevelData"]]

Also I am displaying the annotation and s subset of the evidence file. 

Thank you for the attention!

Montoni.

annotation_file.xlsx
Batch Effect Removal PCA_Montoni-1.pdf
evidence_subset.txt

Devon Kohler

unread,
Apr 15, 2025, 9:14:02 AM4/15/25
to MSstats
Hi Montoni,

Since this is a TMT experiment I would recommend using the dataSummarizationPTM_TMT which will control for the TMT mixtures by using your pooled column. This should fix the batch effects.

Devon

Sam Siljee

unread,
Apr 30, 2025, 10:18:11 PM4/30/25
to MSstats
Hi Montoni,

This might not be relevant to you, but I've noticed that including more details in the condition column can improve results (mostly for LFQ data though, so I don't know if it applies to your situation).
For example, the conditions "treated" and "untreated" changed to "treated_male", "treated_female", "untreated_male", and "untreated_female", then adapting the contrast matrix accordingly.

Sam
Reply all
Reply to author
Forward
0 new messages