Trinotate - Issues with empty eggnog, Kegg, and gene_ontology_blast columns

561 views
Skip to first unread message

jvi...@cub.uca.edu

unread,
Feb 7, 2017, 5:36:49 PM2/7/17
to trinityrnaseq-users
I'm running into the issue of the eggnog, Kegg, and gene_ontology_blast columns being empty after my Trinotate run using a boilerplate sqlite database. Attached is a screenshot of the XML annotation report. Pfam GO seems to be fine however.

Thanks,

- James
Screenshot from 2017-02-07 16-31-40.png

Brian Haas

unread,
Feb 8, 2017, 9:26:22 AM2/8/17
to jvi...@cub.uca.edu, trinityrnaseq-users
Hi James,

Did you build the boilerplate from scratch and search the uniprot_sprot.pep file generated by the process?   

    eg. $TRINOTATE_HOME/admin/Build_Trinotate_Boilerplate_SQLite_db.pl Trinotate

~brian


--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

jvi...@cub.uca.edu

unread,
Feb 8, 2017, 10:11:32 AM2/8/17
to trinityrnaseq-users, jvi...@cub.uca.edu
That is the mistake I made. I separately created a swiss_prot database. I see you released Trinity version 2.4.0, which I am excited try out so I'll make sure to use the swiss_prot db that the boilerplate sqlite creates. Thank you for this great tool!

A few questions:

Will version Trinity 2.4.0 increase the assemblies quality? So far I have found that as I reassemble using new releases I get larger assemblies. I am having difficulty in quality assessment so I'm not sure if it is improving the assemblies.

I have been running Trinity with and without Trimmomatic. Will Trimmomatic typically improve assembly quality? If so this could save me a lot of potentially wasted time assembling untrimmed reads.

I found this tool called PASA, which can combine genome-guided and de novo assemblies. Will this preserve the de novo derived transcripts?

Lastly, when using Transdecoder will retaining a single best protein including Pfam and BLASTP hits improve translation accuracy or am I wasting time? I'm working with a intel i5 cpu and 32gb of RAM so things aren't moving as fast as I wished.

Attached is my Trinotate script, which might include flaws I'm unaware of (new to BASH scripting, but I'm loving it).

Thank you,

James

Trinotate_Run.sh

Brian Haas

unread,
Feb 8, 2017, 10:43:46 AM2/8/17
to jvi...@cub.uca.edu, trinityrnaseq-users
responses below

On Wed, Feb 8, 2017 at 10:11 AM, <jvi...@cub.uca.edu> wrote:
That is the mistake I made. I separately created a swiss_prot database. I see you released Trinity version 2.4.0, which I am excited try out so I'll make sure to use the swiss_prot db that the boilerplate sqlite creates. Thank you for this great tool!

A few questions:

Will version Trinity 2.4.0 increase the assemblies quality? So far I have found that as I reassemble using new releases I get larger assemblies. I am having difficulty in quality assessment so I'm not sure if it is improving the assemblies.

We always strive to improve on the assembly quality, but there's actually been little differences in bulk assembly statistics across recent releases...  most improvements have targeted certain edge cases that are negligible in the bulk stats.   My hope is that when we get to releasing Trinity v3, it'll truly push the envelope.

 

I have been running Trinity with and without Trimmomatic. Will Trimmomatic typically improve assembly quality? If so this could save me a lot of potentially wasted time assembling untrimmed reads.

It depends on how 'bad' the reads are.   It's never a bad idea to do light quality trimming. We have documentation and references on this in the trinity documentation (MacManes).
 

I found this tool called PASA, which can combine genome-guided and de novo assemblies. Will this preserve the de novo derived transcripts?

yes, it'll preserve the de novo derived transcripts.  If they're not integrated into genome-alignment based models, they'll continue to persist in the comprehensive transcriptome database output.
 

Lastly, when using Transdecoder will retaining a single best protein including Pfam and BLASTP hits improve translation accuracy or am I wasting time? I'm working with a intel i5 cpu and 32gb of RAM so things aren't moving as fast as I wished.

Nope.... it just addresses what some folks feel is an annoying feature that sometimes individual transcripts encode multiple orfs. ;-)

 

Attached is my Trinotate script, which might include flaws I'm unaware of (new to BASH scripting, but I'm loving it).

Thank you,


best of luck!

~brian
 

James

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-users+unsub...@googlegroups.com.
To post to this group, send email to trinityrnaseq-users@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages