Fractions, TechRepMixture MSstatsTMT

332 views
Skip to first unread message

froehlic...@gmail.com

unread,
Feb 19, 2021, 9:42:12 AM2/19/21
to MSstats
Dear msstats team,
I am currently working on a galaxy tutorial for Msstats to make it accessible for more people.

After running into the same error multiple times on galaxy, I downlaoded the maxQuant output and wanted to perform the msstatsTMT analysis locally.

I have created an annotation file (the same way i always prepare them) and when I start the protein summarization, msstats gives me an error:

> dataprocess_small_df <- MSstatsTMT::proteinSummarization(data = small_df, global_norm = F)

Summarizing for Run : Mixture1_1 ( 1  of  1 )

Fehler in dataProcess(sub_data, normalization = FALSE, summaryMethod = "TMP",  :

  ** MSstats suspects that there are fractionations and potentially technical replicates too. Please add Fraction column in the input.


annotation and a small dataframe from the MaxQtomsstatstmt() function are attached which can reproduce the error.

It has to be an error with my annotation, but I dont get why.

experimental design:
1 experiment ( = 1 TMT Mixture ) with 11 Channels (some empty)
12 measurements ( = 12 fractions , no fractions was measured twice! )

So I labeled the raw files with fraction 1-12 and gave every fraction TechRepMixture = 1

Could you please take a look and tell me what I am doing wrong? 

Best Klemens
workspace_ReproducibleError.RData

thuan...@gmail.com

unread,
Feb 19, 2021, 10:44:04 AM2/19/21
to MSstats
Hi Klemens,

The annotation file seems to have no problem and consistent with the design you described. 

I have trouble with opening the small dataframe from the MaxQtomsstatstmt() function. Could you please share it again? So that I can reproduce the error on my side.

Best,
Ting

froehlic...@gmail.com

unread,
Feb 19, 2021, 11:24:03 AM2/19/21
to MSstats
sorry, I think i described the small_df poorly:
for convenience this small_df is after the maxqtomsstatstmt() function, so that you dont need the original maxquant input files.

here is the code I used to make it clear:

#load data
proteinGroups <- read.delim("Galaxy137-[MaxQuant_Protein_Groups_for_data_26,_data_18,_and_others].tabular")
evidence <- read.delim("Galaxy140-[MaxQuant_Evidence_for_data_26,_data_18,_and_others].tabular")
annotation <- read.delim("MSstatsTMTannotationMFA380_raw_thermo.txt")

# 11column format 
format_msstatsTMT <- MaxQtoMSstatsTMTFormat(evidence = evidence,
                                            proteinGroups = proteinGroups, 
                                            annotation = annotation)

# protein summarization
dataprocess_msstatsTMT <- MSstatsTMT::proteinSummarization(data = format_msstatsTMT)
# this gives the error described above


#reproducible error subset with 3 random proteins of the 11 column format:
small_df <- format_msstatsTMT[grepl("GSK3A_HUMAN|PCDH9_HUMAN|DUS2L_HUMAN", format_msstatsTMT$ProteinName),]

dataprocess_small_df <- MSstatsTMT::proteinSummarization(data = small_df, global_norm = F)
# this gives the error described above

Would you like the complete format_msstatsTMT object? I will already export another workspace to have iot ready.
Best Klemens

Mateusz Staniak

unread,
Feb 19, 2021, 12:04:39 PM2/19/21
to MSstats
Hi,


@Klemens: adding column "Fraction" with a constant value (example: small_df$Fraction = 1) fixes the problem, at least for me.
@Ting: let's check what exactly is the problem. Perhaps we will need to update MSstats::dataProcess with an option to switch off the check for fractionation.


Best,
Mateusz

froehlic...@gmail.com

unread,
Feb 19, 2021, 5:58:48 PM2/19/21
to MSstats
Hi,
but that would mean that I have to modify the 11 column format of msstats?
Is this something caused by my annotation, or the raw data, which I could fix without changing the 11 column format?

Best, Klemens

thuan...@gmail.com

unread,
Feb 22, 2021, 10:59:31 AM2/22/21
to MSstats
Hi Klemens,

It turns out the error is indeed from the annotation file. All three Empty channels have the same BioReplicate ID, which makes MSstats confused. It is fractionation or technical replicates? That’s why it asks for Fraction column.

So if giving the three Empty channels different BioReplicate ID, there should be no error. 

Thank you for reporting this problem. We will fix it in our future release. 

Best,
Ting

froehlic...@gmail.com

unread,
Feb 22, 2021, 11:02:34 AM2/22/21
to MSstats
Hi Ting,
That was definitely not a problem in the past and we have performed many analyses in msstatsTMT with multiple Empty channels.
I will have to change the instructions on how to set up an annotation file.....
Best, Klemens

thuan...@gmail.com

unread,
Feb 22, 2021, 11:31:03 AM2/22/21
to MSstats
This is because this specific dataset has a larger number of missing values within TMT mixture (or experiment). See the example.
Screen Shot 2021-02-22 at 11.24.19 AM.png
Empty channels 2 and 3 are missing for AVTLSILNDNDNFVLDPYSGVIK_3.  Such cases are rare for other TMT experiments. So most datasets with the same Empty channels you have analyzed before are okay.  

My suggestion here is (1) to change the annotation for this small dataset; (2) to keep the instruction the same. We will also fix this problem on our side.

Best,
Ting

froehlic...@gmail.com

unread,
Feb 22, 2021, 11:39:43 AM2/22/21
to MSstats
thanks a lot for looking at this. This helps me a lot with the galaxy cluster!

So also in the future it will be okay to label all "empty" channels as "empty" in the Condition column and give them the same BioReplicate number?
Sorry, I just want to be clear if you prefer something else, I would include that in the Galaxy workshop.
Best, Klemens

thuan...@gmail.com

unread,
Feb 22, 2021, 12:00:24 PM2/22/21
to MSstats
Yes,  it will be okay to label all empty channels as Empty in the Condition column with the same BioReplicate IDs.

Best,
Ting
Reply all
Reply to author
Forward
0 new messages