Please can someone explain Trinity output 2.0.6

490 views
Skip to first unread message

Darren Obbard

unread,
May 1, 2015, 5:15:24 AM5/1/15
to trinityrn...@googlegroups.com
Hi,

I previously thought I understood trinity output. It has changed, and now I don't.

According to http://trinityrnaseq.github.io/#trinity_output output sequence names look like this:

>c115_g5_i1 len=247 path=[31015:0-148 23018:149-246]

But the ones I (now) get look nothing like this. Instead they look like this:


>TR1|c0_g1_i1 len=264 path=[487:0-98 488:99-263] [-1, 487, 488, -2]

This means that the description given in http://trinityrnaseq.github.io/#trinity_output

"

The accession encodes the Trinity gene and isoform information. In the example above, the accession c115_g5_i1 indicates Trinity read cluster c115, gene g5, and isoform i1. Because a given run of trinity involves many many clusters of reads, each of which are assembled separately, and because the gene numberings are unique within a given processed read cluster, the gene identifier should be considered an aggregate of the read cluster and corresponding gene identifier, which in this case would be c115_g5.

So, in summary, the above example corresponds to gene id: c115_g5 encoding isoform id: c115_g5_i1.

"

No longer seems to apply. Specifically, TrinityStats.pl tells me


Total trinity 'genes':  115437
Total trinity transcripts:      133490
Percent GC: 46.50


Whereas

grep -o '\c[0-9]\+_g[0-9]\+' My.Trinity.fasta | sort | uniq | wc -l
201

suggests only 201 unique 'genes' according to the description given on http://trinityrnaseq.github.io/#trinity_output

How do I now find unique identifiers for 'genes' in the old sense?

Thanks,

Darren



Darren Obbard

unread,
May 1, 2015, 5:21:31 AM5/1/15
to trinityrn...@googlegroups.com
OK, found my own answer in https://groups.google.com/forum/#!topic/trinityrnaseq-users/n6mUXYgqpAA

Why on earth is this not in great big letters in the description of the trinity output?

Brian Haas

unread,
May 1, 2015, 11:16:52 PM5/1/15
to Darren Obbard, trinityrn...@googlegroups.com
because someone needs to go in and update the documentation.  ;)    

it'll happen shortly.

~b

On Fri, May 1, 2015 at 5:21 AM, Darren Obbard <darren...@googlemail.com> wrote:
OK, found my own answer in https://groups.google.com/forum/#!topic/trinityrnaseq-users/n6mUXYgqpAA

Why on earth is this not in great big letters in the description of the trinity output?

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at http://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 
Reply all
Reply to author
Forward
0 new messages