Isoforms and gene results

137 views
Skip to first unread message

thileepan sekaran

unread,
Mar 5, 2015, 7:05:01 AM3/5/15
to trinityrn...@googlegroups.com, Brian Haas, Tiago Hori
Hi,

I am working on a non model organism for which there is no well annotated genome is not available and the transcriptome has been assembled using the Trinity. I am using this trancriptome for aligning the reads in align_and_estimate.pl with bowtie2 as aligner.
By combining all the .isoforms.results and .gene.results using "abundance_extimate_to_matrix.pl" I got the .trans.count.matrix,trans.count.TMM.Fpkm.matrix and .genes.count.matrix, .genes.count.TMM.Fpkm.matrix. When I checked the genes and trans file for raw count matrix, I don’t find any difference, They are identical. Also for normalised matrix, I don’t find any difference they are identical. Am I making any mistake.? What is the use of these .trans_isoforms and .genes files. Which one I should use for diff analysis?

I am using trinity mainly for finding the differential gene expression.

Kindly guide me

Regards
Thileepan

Tiago Hori

unread,
Mar 5, 2015, 7:08:26 AM3/5/15
to thileepan sekaran, trinityrn...@googlegroups.com, Brian Haas
Did you actually count the number of lines in each file?

T.

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Mar 5, 2015, 7:50:26 AM3/5/15
to thileepan sekaran, trinityrn...@googlegroups.com, Tiago Hori
If you don't use the --trinity_mode parameter, then you'll end up with every transcript being treated as if it were it's own gene.

If you're working with non-Trinity targets, then you would instead specify a gene-to-transcript mapping file w/ the --gene_trans_map parameter.

best,

~brian

--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Brian Haas

unread,
Mar 5, 2015, 8:07:32 AM3/5/15
to thileepan sekaran, Tiago Hori, trinityrn...@googlegroups.com
You can do the DE analysis at either the gene or transcript level. I generally recommend doing it both ways, just to be sure you don't miss anything.  Each has its advantages/disadvantages.

best,

~b

On Thu, Mar 5, 2015 at 8:06 AM, thileepan sekaran <dena....@gmail.com> wrote:
Hi,

I will redo "align_and_estimate" the with parameter specification (trinity_mode) you specified. If I want to do the diff expression analysis, Which file I should use, trans or genes?

Best
Thileepan

Brian Haas

unread,
Mar 5, 2015, 8:18:50 AM3/5/15
to thileepan sekaran, trinityrn...@googlegroups.com
I'd suggest using Trinotate for annotation, and cross-referencing the trinotate data w/ your DE analysis results:



best,

~brian


On Thu, Mar 5, 2015 at 8:16 AM, thileepan sekaran <dena....@gmail.com> wrote:
Hi, the model organism I am working doesnt have any annotations. So every time, I find the diff expressed transcripts, for that transcripts I BLASTx to find the closest homolog  if there any. I mean to say that I have no gene.gtf file which contains gene names. So in this case it it wise to make use of  transcript level. In future, I will compare the transcriptome data with translatome data (OMICS profiling), so Can I make use of transcript level?

Regards
Thili

Brian Haas

unread,
Mar 5, 2015, 8:37:53 AM3/5/15
to thileepan sekaran, trinityrn...@googlegroups.com
Hopefully everything you need is described here:


best,

~b

On Thu, Mar 5, 2015 at 8:29 AM, thileepan sekaran <dena....@gmail.com> wrote:
Thanks for your suggestion. I have already annotated my reference transcriptome using local BLASTx and phobius and I have necessary annotation. But I will also use trinotate. 

But for analysis where I will compare the datasets of transcriptome and translatome, Can I make use of trans.count matrix and trans.TMM_FPKM.matrix ?

Kindly guide me.

Brian Haas

unread,
Mar 6, 2015, 9:24:16 AM3/6/15
to Sunny Sun, trinityrn...@googlegroups.com
At the gene level, you'll have more power for DE because you'll be using read counts accumulated across all isoforms for that gene.  Disadvantage - if your isoforms are clustered by 'gene' and aren't actually a single gene (paralogs, etc.) then it can be misleading.  Note, this is less of a problem with the latest versions of Trinity due to the higher precision of the isoform clustering.

At the transcript level, you'll know precisely which transcript is providing the signal, but power will be less than the gene due to the above.

best,

~brian

On Fri, Mar 6, 2015 at 8:58 AM, Sunny Sun <sols...@gmail.com> wrote:
I wonder when you do DGE with genes only which isoform is chosen or is there a gene model build to represent all isoforms? Brian can you explain more about advantages/disadvantages of doing both DGE at genes and isoform level? For the moment I only considered doing it at gene level to have an overview of what's being expressed differently with no interest in alternative splicing for example.
S.

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages