Trinity SuperTranscripts

654 views
Skip to first unread message

Brian Haas

unread,
Aug 24, 2017, 2:28:20 PM8/24/17
to trinityrn...@googlegroups.com
Greetings all.

In the next release, we're planning to have support for 'super transcripts' to facilitate both variant calling and differential transcript usage analysis, as described in this excellent work by Nadia Davidson:


As an alternative option to using the authors Lace utility, which leverages BLAT to define sequence segment relationships among isoforms, we have a script that constructs super transcripts based on the Trinity isoform graph structure as encoded in the header of the Trinity fasta file.  For now, you can obtain the script here:

https://github.com/trinityrnaseq/trinityrnaseq/blob/devel/Analysis/DifferentialExpression/Trinity_gene_splice_modeler.py


usage: Trinity_gene_splice_modeler.py [-h] --trinity_fasta TRINITY_FASTA
                                      [--out_prefix OUT_PREFIX]
                                      [--incl_malign] [--debug]
Converts Trinity Isoform structures into a single gene structure
representation
optional arguments:
  -h, --help            show this help message and exit
  --trinity_fasta TRINITY_FASTA
                        Trinity.fasta file (default: )
  --out_prefix OUT_PREFIX
                        output prefix for fasta and gtf outputs (default:
                        trinity_genes)
  --incl_malign         include multiple alignment formatted output file
                        (default: False)
  --debug               debug mode (default: False)



So, all you need as input is your Trinity.fasta file.

We're still testing this in comparison to results from Lace to ensure that results are highly similar, but wanted to make it available sooner rather than later in case it turns out to be as helpful as we hope.

best,

~brian



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

rik.ve...@gmail.com

unread,
Oct 2, 2017, 6:15:56 AM10/2/17
to trinityrnaseq-users
Dear all,


I could not find it at Brian's link. Just to be sure: is it this?

https://github.com/trinityrnaseq/trinityrnaseq/blob/devel/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py


Best wishes,
Rik

Brian Haas

unread,
Oct 2, 2017, 7:15:41 AM10/2/17
to rik.ve...@gmail.com, trinityrnaseq-users
I've been reorganizing things in prep for the upcoming release.  You'll now find a supertranscripts directory in the analysis dir that contains the relevant code.  I can send more precise info later today

Best,

-Brian
(by iPhone)

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Brian Haas

unread,
Oct 2, 2017, 10:30:14 AM10/2/17
to Rik Verdonck, trinityrnaseq-users
Looks like you did find it.

Let me know if it gives you trouble.

best,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.

Antoine Felden

unread,
Oct 15, 2017, 7:57:19 PM10/15/17
to trinityrnaseq-users
Hi all,

What is the difference between Trinity 'genes' (as used in the DE analysis pipeline for exemple) and SuperTranscripts?
The terms seem to be used interchangeably in the wiki.

Thanks, also it's really exciting that you integrated variant calling into Trinity, can't wait to try this.
Antoine

Brian Haas

unread,
Oct 15, 2017, 9:49:26 PM10/15/17
to Antoine Felden, trinityrnaseq-users
The nice thing about supertranscripts is that they provide an actual 'gene' sequence (as opposed to the more general Trinity 'gene' concept, which only treats a gene as a set of isoforms and didn't provide any sequence representation for it).   It also opens up the use of methods such as DEXseq for doing differential transcript usage analysis.  Definitely check out the Davidson et al. paper:

Note, I'm going to be putting out a Trinity patch release sometime soon (this week or next) that will have updated methods for the supertranscript DEXseq analysis, leveraging the faster 'subreads' featureCounts software as opposed to the unberably slow method currently employed for doing the feature counting.    This doesn't impact supertranscript construction or anything to do with variant calling, if that's your main interest.

best,

~brian


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

Piotr Janicki

unread,
Oct 31, 2017, 1:07:04 PM10/31/17
to trinityrnaseq-users
Hi Brian,
thank you for incorporating Lace into Trinity.
However, when trying to execute the script on my previous Trinity.fasta file (created with 2.1.1 version of trinity) I got following formatting error message:

  File "/ri/shared/modules/trinity/2.5.1/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py", line 870, in <module>

    main()

  File "/ri/shared/modules/trinity/2.5.1/Analysis/SuperTranscripts/Trinity_gene_splice_modeler.py", line 826, in main

    logger.info("Processing Gene: {} having {} isoforms".format(gene_name, len(node_path_obj_list)))

ValueError: zero length field name in format

srun: error: hpccomp10: task 0: Exited with exit code 1


Not sure if anybody experienced this problem before ?

Thank for any help.

Peter 

Brian Haas

unread,
Oct 31, 2017, 1:11:35 PM10/31/17
to Piotr Janicki, trinityrnaseq-users
I’ve seen this error when old versions of python are used.  Py2.7 or 3 should work ok

-Brian
(by iPhone)

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.

Piotr Janicki

unread,
Oct 31, 2017, 2:08:43 PM10/31/17
to trinityrnaseq-users
Thanks Brian for your fast response.
The problem is however that this is huge fasta file which took several days to compute. I just don't like to repeat the entire trinity run. Do you have any other ideas how I can reformat the trinity.fasta file obtained using older python version ?
Thanks
Peter
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.

Brian Haas

unread,
Oct 31, 2017, 2:27:00 PM10/31/17
to Piotr Janicki, trinityrnaseq-users
I don't think you should need to rerun Trinity.  I think you just need to run the supertranscript-generating script using Py2.7 or Py3.  I hope this clarifies...

best,

~b

To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsubscribe...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.

Piotr Janicki

unread,
Oct 31, 2017, 4:53:26 PM10/31/17
to trinityrnaseq-users
Hi Brian,
it works great now. Thanks your kindly for your help
Peter

On Thursday, August 24, 2017 at 2:28:20 PM UTC-4, Brian Haas wrote:

razieh

unread,
Jul 8, 2018, 8:53:33 AM7/8/18
to trinityrnaseq-users
Hi
I  have a problem with lots of redundancy in tirinity output assembled file. I got a  fasta file containg 302000 contigs. To remove redundancy and get unigenes I used cd_Hit. After cd_hit running, I got an output containig 240000 contigs showing lots of redundancy again. becuse I am working on a diploid plant and 240000 unigenes is not logical. Is there any way to I remove these redundances? Can I use Trinity_Splice_modeler.py as a way for removing redundances and getting unigenes? please give me an advice to solve this problem

Ken Field

unread,
Jul 8, 2018, 9:34:09 AM7/8/18
to rahmati....@gmail.com, trinityrnaseq-users
Have you tried DRAP? http://www.sigenae.org/drap/

It would mean starting your assembly over, but it produces a more compact assembly than running Trinity alone, at least for mammals!

-Ken


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.

To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.


--
Ken Field, Ph.D.
Professor of Biology
Associate Chair of Biology
Program in Cell Biology/Biochemistry
Bucknell University
Room 208 Biology Building
Reply all
Reply to author
Forward
0 new messages