Isobaric tag analysis with matched pairs

125 views
Skip to first unread message

johanne...@gmail.com

unread,
Mar 17, 2017, 11:18:10 AM3/17/17
to ProteomicsQA
Hi all,

I have the following issue: I have proteomics and RNASeq data from four patients before and after treatment. The biological variance is much larger than the treatment effect.

If I do a normal differential expression analysis with "before" vs. "after" I get basically no significantly different ids. For the RNASeq data though (edgeR), when I create my design matrix using:

design <- model.matrix(~ read_counts$samples$patient + read_counts$samples$group)

thus taking the different patients into consideration I get nice results.

Is there a way to also do this for TMT labeled data? I normally use "isobar" to analyse this data but I am not aware of a possibility to define such a design matrix or similar there. Is any of the RNASeq / Microarray based packages applicable to tag based data? I guess that tag based data could be highly similar to MicroArray data.

Thanks for the help!

Johannes

johanne...@gmail.com

unread,
Mar 19, 2017, 5:44:29 PM3/19/17
to ProteomicsQA
Hi all,

I might have found a solution to my problem: The "MSstats" Bioconductor package might do the trick. At least it has the notion of a "group" specification (ie. treatment vs. control) and a "biological replicate" specification. Unfortunately, I find the documentation a little scarce.

If anyone has already experience with this package I'd be grateful if you could tell me whether both these factors are considered in the model similar to "edgeR".

Thanks!

Johannes

dtabb1973

unread,
Mar 20, 2017, 1:15:36 AM3/20/17
to ProteomicsQA
Meena Choi, who created MSstats,is pretty good in handling emails, and I know she has taught tutorials on the package. I think she is very likely the best person to ask!


THanks,
Dave

Veit Schwaemmle

unread,
Mar 20, 2017, 1:24:11 AM3/20/17
to Johannes Griss, ProteomicsQA
Hi Johannes,

Why don't you just use the limma package? There are plenty of options to do paired and unpaired tests with different designs. And it was already shown to perform well with iTRAQ/TMT data.

Cheers
Veit

--
You received this message because you are subscribed to the Google Groups "ProteomicsQA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proteomicsqa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/proteomicsqa/0681173f-ac21-4945-b5d9-277fb400a105%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
            \|||/
           (o o)
 ----ooO-(_)-Ooo----

Don't worry about life; you're not going to survive it anyway.

Markus Hartl

unread,
Mar 20, 2017, 4:24:36 AM3/20/17
to ProteomicsQA
Hi Johannes,

just to support Veits advice, I would also go for LIMMA. You can specify the linear model in a similar way as you have mentioned for edgeRm.

Cheers,
Markus

johanne...@gmail.com

unread,
Mar 20, 2017, 9:25:54 AM3/20/17
to ProteomicsQA, johanne...@gmail.com
Hi Veit & Markus,

Thanks for the suggestion - actually I also thought of limma first but then came across the MSstats package. I'll now start with limma and will then test MSstats :-)

Based on the documentation MSstats also support linear models as used by limma.

Cheers,
Johannes

Veit Schwaemmle

unread,
Mar 20, 2017, 9:33:52 AM3/20/17
to Johannes Griss, ProteomicsQA
Hi Johannes,

Great! Be aware that limma has a correction that sets a global error level to discard significant hits at low fold changes, so there is a bit more than just linear modeling.

If you want to have a fast test of looking for changes, you can use my app, http://computproteomics.bmb.sdu.dk/Apps/LimmaRP/, but it is only using paired tests (t-test, limma + rank products).

Nothing better than some self-advertisement :-)

Cheers
Veit

--
You received this message because you are subscribed to the Google Groups "ProteomicsQA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proteomicsqa+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

nyaradzo.chig...@uct.ac.za

unread,
Sep 8, 2017, 7:52:33 AM9/8/17
to ProteomicsQA
Hello
I  am trying to use MSstats with MaxQuant data. I have  managed to  get the right format as per manual i think but am getting an error getting the quant .

see below

 > quant <- MaxQtoMSstatsFormat(evidence=infile, annotation=annot, proteinGroups=proteinGroups)
** + Contaminant, + Reverse, + Only.identified.by.site, proteins are removed.
** Peptide and charge, that have 1 or 2 measurements across runs, are removed.
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
>

2. I also would like to NOT delete the contaminants. This is because some of my proteins of interest are Keratins classifoed as contaminants. How can i not exclude them during data procssing.

thanks

Nyari

Vladimir Gorshkov

unread,
Sep 14, 2017, 7:24:40 AM9/14/17
to ProteomicsQA
Hi Nyari,

regarding your second question. You can overwrite the column indicating contaminants (it has "+" if the protein is the contaminant and nothing if it is not). That should solve the problem.

Best regrads,
Vladimir

nyaradzo.chig...@uct.ac.za

unread,
Sep 16, 2017, 8:21:28 AM9/16/17
to ProteomicsQA
Thank you Vladimir for the great suggestion.

anyone with any idea how to solve the error

"Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column"


Nyari

Johannes Griss

unread,
Sep 20, 2017, 4:18:42 PM9/20/17
to proteo...@googlegroups.com

Hi Nyari,

Meena Choi who develops MSstats is normally very responsive. Maybe you can ask her directly? (her e-mail address is on the Bioconductor page).

Without knowing the details it seems that your file has a different formatting than what's being expected. I guess that one of the column names simply doesn't fit.

Maybe you can post the header (ie. the first 2-3 lines) of the files here.

Best regards,
Johannes
--
You received this message because you are subscribed to a topic in the Google Groups "ProteomicsQA" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/proteomicsqa/_ovqO4IBSFw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to proteomicsqa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/proteomicsqa/a4d192d1-c190-4e90-a03e-c646e08562b1%40googlegroups.com.

niki...@gmail.com

unread,
Nov 21, 2017, 8:38:58 AM11/21/17
to ProteomicsQA
Hi Veit,

I was looking for people that are using Limma in order to get to know which package to use and I saw your app. In the input data, do you mean log  intensities at the protein or at the peptide level?

Also, for the limma R package, is it the standard one for Microarrays? What data table do I need as an input if I have searched the data with MaxQuant? The peptides.txt file? How can I then filter out the contaminants and reverse columns as well as missing values in the limma package?

Thank you very much,
Nikiana


On Monday, 20 March 2017 13:33:52 UTC, Veit Schwämmle wrote:
Hi Johannes,

Great! Be aware that limma has a correction that sets a global error level to discard significant hits at low fold changes, so there is a bit more than just linear modeling.

If you want to have a fast test of looking for changes, you can use my app, http://computproteomics.bmb.sdu.dk/Apps/LimmaRP/, but it is only using paired tests (t-test, limma + rank products).

Nothing better than some self-advertisement :-)

Cheers
Veit
2017-03-20 14:25 GMT+01:00 <johanne...@gmail.com>:
Hi Veit & Markus,

Thanks for the suggestion - actually I also thought of limma first but then came across the MSstats package. I'll now start with limma and will then test MSstats :-)

Based on the documentation MSstats also support linear models as used by limma.

Cheers,
Johannes

--
You received this message because you are subscribed to the Google Groups "ProteomicsQA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proteomicsqa...@googlegroups.com.

Veit Schwämmle

unread,
Nov 22, 2017, 2:00:13 AM11/22/17
to niki...@gmail.com, ProteomicsQA
Hi Nikiana,

The input for my app are log intensities and I recommend applying the tests on protein level. Otherwise, you will need to identify and interpret common significant changes of the peptide abundances.

The app is an interface to use limma and rank products and the limma method is the same as the standard limma one originally developed for microarray data.

The description in LimmaRP says
"Data input: Data table (csv file) with optionally row and column names. The values should be log-transformed intensity/abundance values (not ratios) which have been normalized to be comparable. The order of the columns is required to be A1, A2, A3, ..., B1, B2, B3, ..., where 1,2,3 ... are the conditions and A,B,... denote replicates"

You should filter out contaminants and reverse hits before. You don't need to filter the missing values as the tests work OK with them as shown in the original publication.

Best regards
Veit

To unsubscribe from this group and stop receiving emails from it, send an email to proteomicsqa+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/proteomicsqa/28542013-a979-4473-bd66-b96e72956815%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

jiangj...@gmail.com

unread,
Jun 25, 2018, 11:47:57 AM6/25/18
to ProteomicsQA

I know this is a very old thread. But I got the same error today using MSstats package to load my proteomics data saying 

"Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column"

did you ever manage to solve the problem? please let me know. 



 
在 2017年9月16日星期六 UTC+8下午8:21:28,nyaradzo.chig...@uct.ac.za写道:
Reply all
Reply to author
Forward
0 new messages