Trinity SuperTranscripts

Brian Haas

unread,

Aug 24, 2017, 2:28:20 PM8/24/17

to trinityrn...@googlegroups.com

Greetings all.

In the next release, we're planning to have support for 'super transcripts' to facilitate both variant calling and differential transcript usage analysis, as described in this excellent work by Nadia Davidson:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1284-1

As an alternative option to using the authors Lace utility, which leverages BLAT to define sequence segment relationships among isoforms, we have a script that constructs super transcripts based on the Trinity isoform graph structure as encoded in the header of the Trinity fasta file. For now, you can obtain the script here:

https://github.com/trinityrnaseq/trinityrnaseq/blob/devel/Analysis/DifferentialExpression/Trinity_gene_splice_modeler.py

usage: Trinity_gene_splice_modeler.py [-h] --trinity_fasta TRINITY_FASTA
[--out_prefix OUT_PREFIX]
[--incl_malign] [--debug]
Converts Trinity Isoform structures into a single gene structure
representation
optional arguments:
-h, --help show this help message and exit
--trinity_fasta TRINITY_FASTA
Trinity.fasta file (default: )
--out_prefix OUT_PREFIX
output prefix for fasta and gtf outputs (default:
trinity_genes)
--incl_malign include multiple alignment formatted output file
(default: False)
--debug debug mode (default: False)

So, all you need as input is your Trinity.fasta file.

We're still testing this in comparison to results from Lace to ensure that results are highly similar, but wanted to make it available sooner rather than later in case it turns out to be as helpful as we hope.

best,

~brian

--

--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

rik.ve...@gmail.com

unread,

Oct 2, 2017, 6:15:56 AM10/2/17

to trinityrnaseq-users

Dear all,

I could not find it at Brian's link. Just to be sure: is it this?

https://github.com/trinityrnaseq/trinityrnaseq/blob/devel/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py

Best wishes,
Rik

Brian Haas

unread,

Oct 2, 2017, 7:15:41 AM10/2/17

to rik.ve...@gmail.com, trinityrnaseq-users

I've been reorganizing things in prep for the upcoming release. You'll now find a supertranscripts directory in the analysis dir that contains the relevant code. I can send more precise info later today

Best,

-Brian

(by iPhone)

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,

Oct 2, 2017, 10:30:14 AM10/2/17

to Rik Verdonck, trinityrnaseq-users

Looks like you did find it.

Let me know if it gives you trouble.

best,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.

Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Antoine Felden

unread,

Oct 15, 2017, 7:57:19 PM10/15/17

to trinityrnaseq-users

Hi all,

What is the difference between Trinity 'genes' (as used in the DE analysis pipeline for exemple) and SuperTranscripts?

The terms seem to be used interchangeably in the wiki.

Thanks, also it's really exciting that you integrated variant calling into Trinity, can't wait to try this.

Antoine

Brian Haas

unread,

Oct 15, 2017, 9:49:26 PM10/15/17

to Antoine Felden, trinityrnaseq-users

The nice thing about supertranscripts is that they provide an actual 'gene' sequence (as opposed to the more general Trinity 'gene' concept, which only treats a gene as a set of isoforms and didn't provide any sequence representation for it). It also opens up the use of methods such as DEXseq for doing differential transcript usage analysis. Definitely check out the Davidson et al. paper:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1284-1

Note, I'm going to be putting out a Trinity patch release sometime soon (this week or next) that will have updated methods for the supertranscript DEXseq analysis, leveraging the faster 'subreads' featureCounts software as opposed to the unberably slow method currently employed for doing the feature counting. This doesn't impact supertranscript construction or anything to do with variant calling, if that's your main interest.

best,

~brian

--

You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Piotr Janicki

unread,

Oct 31, 2017, 1:07:04 PM10/31/17

to trinityrnaseq-users

Hi Brian,

thank you for incorporating Lace into Trinity.

However, when trying to execute the script on my previous Trinity.fasta file (created with 2.1.1 version of trinity) I got following formatting error message:

File "/ri/shared/modules/trinity/2.5.1/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py", line 870, in <module>

main()

File "/ri/shared/modules/trinity/2.5.1/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py", line 826, in main

logger.info("Processing Gene: {} having {} isoforms".format(gene_name, len(node_path_obj_list)))

ValueError: zero length field name in format

srun: error: hpccomp10: task 0: Exited with exit code 1

Not sure if anybody experienced this problem before ?

Thank for any help.

Peter

Brian Haas

unread,

Oct 31, 2017, 1:11:35 PM10/31/17

to Piotr Janicki, trinityrnaseq-users

I’ve seen this error when old versions of python are used. Py2.7 or 3 should work ok

-Brian

(by iPhone)

--

You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.

Piotr Janicki

unread,

Oct 31, 2017, 2:08:43 PM10/31/17

to trinityrnaseq-users

Thanks Brian for your fast response.

The problem is however that this is huge fasta file which took several days to compute. I just don't like to repeat the entire trinity run. Do you have any other ideas how I can reformat the trinity.fasta file obtained using older python version ?

Thanks

Peter

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

Brian Haas

unread,

Oct 31, 2017, 2:27:00 PM10/31/17

to Piotr Janicki, trinityrnaseq-users

I don't think you should need to rerun Trinity. I think you just need to run the supertranscript-generating script using Py2.7 or Py3. I hope this clarifies...

best,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsubscribe...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

To post to this group, send email to trinityrnaseq-users@googlegroups.com.

Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Piotr Janicki

unread,

Oct 31, 2017, 4:53:26 PM10/31/17

to trinityrnaseq-users

Hi Brian,

it works great now. Thanks your kindly for your help

Peter

On Thursday, August 24, 2017 at 2:28:20 PM UTC-4, Brian Haas wrote:

razieh

unread,

Jul 8, 2018, 8:53:33 AM7/8/18

to trinityrnaseq-users

Hi

I have a problem with lots of redundancy in tirinity output assembled file. I got a fasta file containg 302000 contigs. To remove redundancy and get unigenes I used cd_Hit. After cd_hit running, I got an output containig 240000 contigs showing lots of redundancy again. becuse I am working on a diploid plant and 240000 unigenes is not logical. Is there any way to I remove these redundances? Can I use Trinity_Splice_modeler.py as a way for removing redundances and getting unigenes? please give me an advice to solve this problem

Ken Field

unread,

Jul 8, 2018, 9:34:09 AM7/8/18

to rahmati....@gmail.com, trinityrnaseq-users

Have you tried DRAP? http://www.sigenae.org/drap/

It would mean starting your assembly over, but it produces a more compact assembly than running Trinity alone, at least for mammals!

-Ken

--

You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--

Ken Field, Ph.D.

Professor of Biology

Associate Chair of Biology

Program in Cell Biology/Biochemistry

Bucknell University

Room 208 Biology Building

Reply all

Reply to author

Forward