TMT 10-plex MaxQuant analysis

1,085 views
Skip to first unread message

niki...@gmail.com

unread,
Sep 23, 2017, 4:29:19 PM9/23/17
to ProteomicsQA
Hi there,

I have done a TMT-10 plex on whole cell lysates and analyzed the data with MaxQuant. I got the reporter ion intensities but I am not sure how to analyze the data further. Basically, I used the 9 out of 10 labels to identify differential protein abundances in 3 conditions, each condition done 3 times. Hence I used the 9 labels. But the question is how do I analyze the data to find differential protein abundances between condition 1 and 2 and between condition 1 and 3, which are the main questions of this project.

1. Do I first log transform the data and then take averages of replicates and then fold changes? In this case, how do you calculate fold changes on log transformed data? -> [Log(avgA)-[Log(avgB)] ?
2. Do I median centre the columns? And if so, what does this practically mean? Take the median for each column/reporter ion and then subtract (not divide, right?) each intensity of each column with from its own median?
3. Do I really need to calculate multiple t-tests or can I stick with fold changes? and if so can I do it in excel? if I calculate multiple t-tests how do I correct them due to the hypothesis of multiple t-testing?

Thank you very much for your help.

Niki

Vladimir Gorshkov

unread,
Sep 24, 2017, 10:29:01 AM9/24/17
to ProteomicsQA
Hi Niki,

you question is fairly general (i.e. it does not apply only for MaxQuant, but for any other way) and it should be subdivided into several smaller ones.
First you, probably want to do the normalization of your results, in that way you can correct for the systematic bias in your experiment. Imagine the situation, if for some reason the you put more protein in one of the channels, then all proteins will be upregulated.
On a side note it might be good idea to check the labeling before mixing the labeled samples and correct for big differences in labeling "on the wet level". The reason is that the dynamic range of the mass spectrometer is limited and, thus, measuring very high signal together with the very low is problematic. Moreover, one can "lose" small reporters in the noise.
To check if you need normalization one can plot a histogram of log-transformed intensities, typically it should look similar to normal distribution. (The intensities follow what is called log-normal distribution). If you overlay histograms for all your channels they should be more or less the same - since we assume the majority of proteins do not change. If it is not the case normalization is necessary.
There are many different ways to do the normalization, with subtract the mean or median being the simplest ones, to more sophisticated methods, such as quartile normalization. In practical terms (talking about the median normalization) you indeed subtract (since it is logarithm of intensities, on normal scale it will be division) the median of the column from all values. As a result the histograms should overlap better afterwards.
Detecting significantly regulated proteins is a huge topic and there are many different approaches, the simplest ones (and often not very sensitive) is to perform t-test or ANOVA for each row, there are again more sophisticated ways, such as using linear model (like, limma package in R), factor analysis (like diffacto) and many others.
If you use t-test or ANOVA and do work with more then a few measurements, in case of proteomics one usually deals with thousands of proteins), it is absolutely essential to do multiple test correction. Failing to do so will results in huge overestimaton of regulated proteins. Some tools will do it by default, but it is always a good idea to check if it is there. The most common and easy to implement is Benjamni-Hochberg method; another one that is widely used is the method of Storey (http://www.pnas.org/content/100/16/9440.full), it is more complicated (if you want to do it yourself), but there is nice R package called qvalue that can be used.
I am not an expert in Excel, maybe the latest versions did get more statistical functions, however, I would strongly recommend investing some time to make yourself familiar with some scripting languages, for example R and Python. Both have nice toolbox to do statistics and much more. If you will do this type of analysis more than once in your life it definitely worth it.

As you can see, the question you asked is very broad, so I could not cover it in all details. However I tried to indicate the directions you, probably, want to investigate further.

Best regards,
Vladimir

Johannes Griss

unread,
Sep 24, 2017, 10:33:24 AM9/24/17
to proteo...@googlegroups.com

Hi Niki,

Adding to Vladimir's comment: There's a very recent paper in JPR that discusses all of these issues quite nicely and compares different approaches in analysing isobaric tag based data:

DOI: 10.1021/acs.jproteome.6b01050

This could be a good starting point.

Best regards,

Johannes

--
You received this message because you are subscribed to the Google Groups "ProteomicsQA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proteomicsqa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/proteomicsqa/381cc6fb-7179-4a0e-a44c-2d6827222362%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

niki...@gmail.com

unread,
Sep 25, 2017, 6:10:51 PM9/25/17
to ProteomicsQA
Hi Vladimir and Johannes,

Thank you very much for your replies. they are very helpful and have give me good directions.

best wishes,
Niki

niki...@gmail.com

unread,
Sep 25, 2017, 6:37:47 PM9/25/17
to ProteomicsQA
HI Vladimir,

Could I ask if the log and normalization steps that you are suggesting, do you do them at the protein level or at the peptide level? I thought of doing it at the protein level using the proteingroups.txt file provided from MaxQuant.

Thank you again
Nikiana


On Saturday, 23 September 2017 21:29:19 UTC+1, niki...@gmail.com wrote:

Vladimir Gorshkov

unread,
Sep 26, 2017, 7:04:36 AM9/26/17
to ProteomicsQA
Hi, Nikiana,

the normalization can be performed on peptide or protein level, as soon as one can assume that the majority of the proteins are not regulated.

Best regards,
Vladimir

niki...@gmail.com

unread,
Oct 20, 2017, 5:52:33 AM10/20/17
to ProteomicsQA
Hi Vladimir,

I looked at the number of proteins and I can detect/quantify about 5600 of which, roughly, 300 are statistically significantly changing per pairwise comparison based on the FDR multiple t-testing. Therefore, I assume that most of the proteins are not regulated. Am I correct in this assumption? Otherwise, how would would you define "majority of proteins are not regulated" in numbers?

If I were to do the normalization at the peptide level I do not know how to assign/infer the protein intensities based on the peptides after I have done the median-centre normalization at the peptide level (using the peptide.txt file from MaxQuant). Is there a way to do that in excel or is there a script?

Thank you very much for your help.

Nikiana

Johannes Griss

unread,
Oct 20, 2017, 4:13:12 PM10/20/17
to proteo...@googlegroups.com

Hi Nikiana,

This paper http://dx.doi.org/10.1021/acs.jproteome.6b01050 contains a nearly step-by-step instruction on how to go from normalising your peptide intensities to analysing your differentially expressed proteins.

An alternative is MSqRob (Journal of Proteome Research 15(10), 3550-3562) that does the whole analysis on the peptide level. It provides a very nice graphical user interface and works with MaxQuant output. See this page for more information: https://github.com/statOmics/MSqRob

Best regards,

Johannes

--
You received this message because you are subscribed to the Google Groups "ProteomicsQA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proteomicsqa...@googlegroups.com.

niki...@gmail.com

unread,
Oct 22, 2017, 10:49:13 AM10/22/17
to ProteomicsQA
Hi Johannes,

Thank you very much. That's seems really nice. The MSqRob says that it is currently for label-free quant proteomics data. My dataset is TMT 10-plex. Would that also work with the MSqRob?

Thank you very much for your help.
Nikiana

Johannes Griss

unread,
Oct 22, 2017, 10:55:47 AM10/22/17
to proteo...@googlegroups.com
You are right. I forgot about that! So to be honest, I'm not quite sure whether it will work. Maybe you can contact one of the MSqRob developers and ask?

So rather stick to the description in the paper instead.

Good luck!

Best regards,
Johannes

Nikianaki s

unread,
Oct 22, 2017, 7:06:23 PM10/22/17
to Johannes Griss, proteo...@googlegroups.com
Yes i will definitely check with them and also try to follow the description of the other paper you sent. That's great help already.

In case I am not successful with it, do you know if I am correct in interpreting what Vladimir suggested about doing the normalization also at the protein level as soon as I can assume that the majority of the proteins are not regulated in the following way:

My experiment has about 5600 proteins quantified, of which roughly 300 were shown to be statistically significantly (FDR approach) differential abundant per pairwise comparison, so i thought that most of the proteins are not really regulated and I am ok in doing the normalization at the protein level to begin with?!

Thank you very much all of you for your replies. This is great learning for me!
Nikiana

To unsubscribe from this group and stop receiving emails from it, send an email to proteomicsqa+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "ProteomicsQA" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/proteomicsqa/Zvx-_uI8928/unsubscribe.
To unsubscribe from this group and all its topics, send an email to proteomicsqa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/proteomicsqa/19965573-0908-9710-b662-4cd1e222486a%40gmail.com.

Vladimir Gorshkov

unread,
Nov 1, 2017, 2:24:02 PM11/1/17
to ProteomicsQA
Hi, Nikiana,

if one has to be boringly correct, the assumption if the majority of proteins changes or not should be made a priori. For example, do you excpect that perturbation of the cell is minor. Do you expect that only some one or few pathways will be influenced or not? In practical sense the assumption is usually reasonable, if one do not compare cells in very different stages.
In your case, if you performed normalization and the test showed 300 of 5600 proteins regulated, you can assume that the majority of proteins are the same.

Best regards,
Vladimir
Reply all
Reply to author
Forward
0 new messages