Hidden ORF's in Trinity genes no reported by Transdecoder

44 views
Skip to first unread message

Cesar Hernandez

unread,
Feb 14, 2020, 6:07:50 PM2/14/20
to TransDecoder-users
Hi Brain 

I have a doubt whit some outputs in transdecoder program 

I have this trinity transcript

>TRINITY_DN177_c1_g1_i4 len=9235 path=[0:0-7518 3:7519-7786 6:7787-7812 7:7813-7825 9:7826-7828 11:7829-7849 12:7850-7893 14:7894-7901 18:7902-7948 20:7949-8286 22:8287-9133 31:9134-9165 33:9166-9234]
GAGATAATTATGGTCATATTATGAGATAATTATGGTCATATTATGAGATAATTATGGTCATATTATGCTACGAATCTGTGTCTATATTGGTGAATTTACCATGAAAAAGTGATATTTCCGGTACATGCCATTGAACGGCTTGGCTTACCTTCTCAATTATCGTGCTTGGTTTAAACGTTTCTTTTGTTCCGCTTCTATTTTGTTGTACTTTTCGCGCGAGGAACAAGGTTTTTTTCCTTTGCCTAAATATTTGCCTTTGGGTTTTGGTCCTCCAGAGAATATCACGTACTATGGCAGCGAAAGGAGCTTTAAGGTTTTAATTACCCCATAGCCATAGATTCTACTCGGTCTATCTATCATGTAACACTCCGTTGATGCGTACTAGAAAATGACAACGTACCGGGCTTGAGGGACATACAGAGACAATTACAGTAATCAAGAGTGTACCCAATTTTAACGAACTCAGTAAAAAATAAGGAATGTCGACATCTTAATTTTTTATATAAAGCGGTTTGGTATTGATTGTTTGAAGAATTTTCGGGTTGGTGTTTCTTTCTGATGCTACATAGAAGAACATCAAACAACTAAAAAAATATTATAATATGAAAAATATCATTTCATTGGTAAGCAAGAAGAAGGCTGCCTCAAAAAATGAGGATAAAAACATTTCTGAGTCTTCAAGAGATATTGTAAACCAACAGGAGGTTTTCAATACTGAAAATTTTGAAGAAGGGAAAAAGGATAGTGCCTTTGAGCTAGACCACTTAGAGTTCACCACCAATTCAGCCCAGTTAGGAGATTCTGACGAAGATAACGAGAATATGATTAATGAGATGAACGCTACTGATGAAGCAAATGAAGCTAACAGCGAGGAAAAAAGCATGACTTTAAAGCAGGCGTTGCTAAAATATCCAAAAGCAGCCCTGTGGTCCATATTAGTGTCTACTACCCTGGTTATGGAAGGTTATGATACCGCACTACTGAACGCACTGTATGCCCTGCCAGTGTTTCAGAGAAAATTCGGTACTTTGAACGGGGAGGGTTCTTACGAAATTACTTCCCAATGGCAGATTGGTTTAAACATGTGTGTCCAATGTGGTGAGATGATTGGTTTGCAAATCACGACTTATATGGTTGAATTTATGGGGAATCGTTATACGATGATTACAGCACTTGGTTTGTTAACTGCTTATATCTTTATCCTCTACTACTGTAAAAGTTTAGCTATGATTGCTGTGGGACAAGTTCTCTCAGCTATGCCATGGGGTTGTTTCCAGGGTTTGACTGTTACTTATGCTTCGGAAGTTTGCCCTTTAGCATTAAGATATTACATGACCAGTTACTCCAACATTTGTTGGTTATTTGGTCAAATCTTCGCCTCTGGTATTATGAAAAACTCACAAGAGAATTTAGGGAACTCTGACTTGGACTATAAATTGCCATTTGCTTTACAATGGATTTGGCCTGCTCCTTTAATGATCGGTATCTTTTTCGCTCCTGAGTCGCCCTGGTGGTTGGTGAGAAAGGATAGGGTCGCTGAGGCAAGAAAATCTTTAAGCAGAATTTTGAGTGGTAAAGGCGCCGAGAAGGACATTCAAGTTGATCTTACTTTAAAGCAGATTGAATTGACTATTGAAAAAGAAAGACTTTTAGCATCTAAATCAGGATCATTCTTTGATTGTTTCAAGGGAGTTAATGGAAGAAGAACGAGACTTGCATGTTTAGCTTGGGTAGCTCAAAATACTAGCGGTGCCTGTTTACTTGGTTACTCGACATATTTTTTTTGAAAGAGCAGGTATGGCCACCGACAAGGCGTTTACTTTTTCTGTAATTCAGTACTGTCTTGGGTTAGCGGGTACACTTTGCTCCTGGGTAATATCTGGCCGTGTTGGTAGATGGACAATACTGACCTATGGTCTTGCATTTCAAATGGTCTGCTTATTTGTTATTGGTGGAATGGGTTTTGGTTCTGGAAGCGGCGCTAGTAATGGTGCCGGTGGTTTATTGCTGGCTTTATCATTCTTTTACAATGCTGGTATCGGTGCAGTTGTTTACTGTATCGTAGCTGAAATTCCATCAGCGGAGTTGAGAACTAAGACTATAGTGCTGGCCCGTATTTGCTACAATCTCATGGCCGTTATCAACGCTATATTAACGCCCTATATGCTAAACGTGAGCGATTGGAACTGGGGTGCCAAAACTGGTCTATACTGGGGTGGTTTCACAGCAGTCACTTTAGCTTGGGTCATCATCGATCTGCCTGAGACAACTGGTAGAACCTTCAGTGAAATTAATGAACTTTTCAACCAAGGGGTTCCTGCCAGAAAATTTGCATCTACTGTGGTTGATCCATTCGGAAAGGGAAAAACTCAACATGATTCGCTAGCTGATGAGAGTATCAGTCAGTCCTCAAGCATAAAACAGCGAGAATTAAATGCAGCTGATAAATGTTAAGTAAAAGGGTTGTTTTTTTTTTTTGGAAGAAATAAGGAATCCCTTTGACTGCTCCCAAAACCCTCAGCTAGCTCGAGATTTTATATTTATACATTTTTTATTTTTCTGTAAAACATTTATATTTACCATTTTTTAAGCAAAATATTGTTAGTAGTTAGTTAAAATAGCCCAAGCAGCAATCAAGCAAATATGAGAGTATTTTTTCTTTAGCACCTGGTACTTGTGCCTGGATATTGATTCGAACAACATGCCAGGTCAACCGTATTCTCAATTAACTGTTACTTTAAATGTCCATACACTTAATAAAAAGAAGGAGAGAATCACGCAAAAATCACATTAAACTATATTATAATGTTTTATGAAGTATTTTTGAAGCATTAATGTGAGAGCTTATAGAGATACGGCATTCATAATTCAATGCTTTATTGGGATATACTATGCATATTCTATATGGTTCATGAAATATCTCAAAACATATCGCATAGTTTCACAATCGCTTGGCAAATAATTTTCCAATTCTGAGGATATAAACAACCATTTCAAAGAATTATTCTTTCTTGGTGACATGAAAATAATAATAAATTGCATTTGATATACAAAAGCGATTAGTAAAAAAAATCAAACCCATGTGGTCGTCATCCTCACAAACAATTAGGTAAATTTCGTGTTAGCAGTATGTGGTGAAGGTGTTCTAATTAGAAATTGTGATTCAAAAATCTATCACTTTGAATTTGAATATAATGGAATTGTTTTATAAATATTACCTAATTAGCCGCCACTGATATGGCTTCTTTGATCTATTTTTCAGTATAAAGTCAAACAAGATTTCAAAAAACCGGAGGAAGAATATGGGAAATCTACTAGTGTACAAGTTCGTCAAGTTTTCAATGGTCTGACAGCCCTCATCTGCCTTGTGAGAGGCTATAAAGTCGTTAGTCTTCATGTGGCGTATTACTTTTCTTCAGTAGATATTGTTATTATTACTAAGAATAGAGATATATATTGCATTATTGTGAAAAACATCGCTTCTTTTCTAATTTCCCCGCGTCAAGGGCTACGTGTTTTCTGTGAGATGTAAACCATATATCTAAAACAAGTATCATCACGCTTCTTTTATTTGTAGTATTTTTTCTTCGGTTTTCACTGGGGAGCCTGCCCAAGGCGAGGTGCAGTGTTCTTTCTACGGACTTTTTAGGAAGGAAATGAATTAAGCTACGCAGAAAGGACATCTCTTTTCAAAAGTTCTAGACATTTTCAGCCTTCATAGTTCCTAGAGATATGTCCGTATCTAGTAACTACTATAGTTAAATTTAGATCGGATTCTATTAATGTTATGTTCTGCCATAAAAATGTTGCTTTAAATGGCAAACATAAATGACAATAATATCTTGCCTCCTAAACTTAACTGGCATTTATCTATCTTCTCTGTATTGAATAAACGAATGGTGATTCACTCAAATATTCAAATCCCAGAGGTAGCAACATGGTAGCAGAAAGTACCGCCCCTTTCTTACCAATGCTGAATACACCATTTCTAAGTAATATTATCAACAAACCATAGCACTTATCTGCACTCAGCTGTTGTATTTCACTTGTTCCTCTACCCTTCAACTAGTGCTCCATTATAACGGTCGAGCAATACTTCATATCATTCATTTGAGCGTCAGTGCTGACGAAAGTATGATTTCATTAGCTACAATGTGGAAATATAATACTTCTCAAACACTGAAAAATTTCAAAACTTTATTATACAGTTCTTCAACAATAGTATGAGGTAATTACATGGCACTAAGTGCTTCAACGCGCTGTATATAAAGTCATCTTGATTATGCTAATTAAATCACAAAAGCGAAAGGATGAGAGTGAAAGTATCCATATTGCAGCTAGCCTATTCAACCTAAACTTGTCTCCCATCAGTACACAATCATCTCATCTAATATGGCTCTTAGCACTCAGTAAAAAAGTGTATATATACTGAAAACTGTTAGTATAGGAATTCTGGGAAGAATGCTAACGTTAGCGTGCCTAAAAGTATTTTATTAAATATAATTTTCAATAGGGAGACATCATGGTAAGAAGCAGTCTTATACTTGTAAAAGTAGACCTCCAAATAGAAAATTAATTACTCTGTTCATGAAAAGGGGTCGCAATTAGCAGTGAATTTTACCCTTGTGTTGCGGAAACACAGTTTCTACAATGGAAAGCTATCACTTACCAGGAAGTTGAGATGGACAAAAATGAAAATTAATCAAGGGTCTATGTCTTCATTATCCTTGGGATAACCATCCAATTGTAAAGGTTTAGAAATGGGCAGAGTAATTAGGGCATTCTGGCATTTGGTGGAAAATCTCTTCAGCATATCATTGTTATAGTGGTTTAAAGAAAAAGCAAATTTGCATACATCATGCAAAACATTCCATGCTTCTAATTTCATGTTATGATCATACTGGCCTACAATGTCCACCAAAGCAGTAGCTATTTCTAATGTTTTCACTGGTACCCCGGGACCATGGACATCGTAAAGATTCTCTGGGGTTAAGTAAGTATCTATTAACATATCTCTCGCAATTTCTATTGGTATTTGCCCATTGTTAGCGTTCAATGAAAAGTTGCTACCCCTCATTTGATAAGCTAGCTTCCATGCTAGTATCCTAATCCAACGCCGGGAAAATGAGATGTCTATGTAACCGTTAGACCATGGCTCTATTTCCGAGGAAGTTGTGTGGAGTTCGTTCCATATCTTTTTCAATGACTCTTCAGTGCAAGAAGCATCTGTAGAGTCAGCGGCTAAAGCATCGAAGAAACATTTTCCCGGCACGGTAAATATCCTAATCATTTCAAGGAAACTGTCCATAGAAAGCTGAGGATCAGTTACAAGTTCAAGTTGCGGTGGTGCTATTGTGGCATCCAGGCTCGTCGCACAATGAAGATATATGGCATAGTATCTCTCCGTCATGAGAAGCAAGTAATAAAGTTTCCGTCTAAGTTGCTGTTCTTCAAATGTAAGGGATCCGTAAGTTTCTTCCCGATGTAACCCTGCTACCGTAATCAGACCGACCGCTTCACAACAGAGTCTGTAAGAAGTTCTTGCGTTCGATATTTGTGCAAAGCTACGATGCAAACAGTAGTACGTCATAATATTGAATATATTGCTGTTATCCAAATCGTCGAATTGCTGACACGATGAGATGCAAAGATTAGATAACTGTTTTCCCGTGAAAGTGACTTCCTCTTCAGATTTTATTTCAGTTTGTAAATCACTGAGGGTGGCCGCTGATAAAGCGGTCAGAAACCAATATACGTAATTGTCATTGTATTATTCCTCCAGAAGTTTGTGAAGGTCATCGTACGAAAGAAGGGGCCATATTACGTATAAATTATCGTGATAGAGCCGCAAGCACTGATCGATTAGCTTTTTGGGAACCCTCTTATATATTACAGGAGCAGTTGCAATGGTGTTAGGACCGCTTTCCCTCTGCACTTCTGCTATTCTTTTCAAGCTCCTCAACCTAATGGACTTCGGACCTCTTTTTCTCGACGGTTGCAGATAAGTGCAATCCAAACTATTCTGTAGGCAACTGCTACACGGCCTTTTACCATCGCATTTCACTCGACGAATACGACAGCAGTCGCATGCCTGCTTGGCGCATGTTTGCTTAGTTAAAGTCATACTCGAAGCAAAAAAGAGTTCTGGCTAAATTTTCTACTTTTCATTTACTTTAATATATATCGAGATTATTAATTTTCTCTATCTGCGTACTTGAGTTATGAAGAATATAGCAAAATGAGTAATTATGACCGGGGATAAAGAAAACCCATTGTGCTAGCATATGGGGGCTGGTGGTGCAGGAAACCCAGGTCACCCGGACTTTATATGCCAACGTAACCTGCAGTCACCACTATTACTTGTTCAAGATTTTTATTTTGCTCTGCTTATTTCCATCTAAAATGAGAAACCTCAAGATCCGGTTGGAACGGGGCTGTATGTTTATGATTGCTCGAAGACCGTTATAATCACGTTTTTCTTGTTTTTTCGTCAAGAATTCCAGTCAAGTTTTCCATCACCTTGACCCTTAAAGCATCGACTTTTGTGCTCTTGAATGTGTTTCTAAGAATACTTGTAAAGGACACCCTCTAATTTCGTGTGCACTTTTCACATATTATCAAGACAATCGTTCCTGTACTCAGATGCACTGTTACTGTAAAGACTACTATACAACAAGCGAAAAATGATGTTCGAAAACCTTTATTTCTATTTTGAAAGGCATGTGTCTCGAGGTCCTTGCTTTATTGTGGGTGGTCATGCCATTCTGTAAACCTTACGGTACTGCTCCGTCTATATCTTTGAGGTTGTTATTTCCCCACAGATATGCGTTTCTAACCGAATATTCATTCAGTCGGACCGGACAATAGCTCTTAACTTCGTTTACCGGAGTAAATATCGTAAGAATTTGCATGCGGTGAAATACAGGGAAAATAAGAAATTACACCCTAATACAAAAAGAAAACTAAGTTTCACAATACGTAAGGATATTTTAGTGGGGAGAATATTTCGGAGAATAAAGTTTCCAACTCCGCGGTGTGAACAACCGCTCAGCACGCAGCGTTATTCTCGAGAAAAGTGGCCCTGAAATAAGGAGATAAAGTTACTAATGTTTTTTCGCTGTACGATATCAAATGTGACGAAGTAGGCACCCCACGCTATAAACTGGCTACTAAAGTTTATGTCAGTACTTGGGATCGTTGAAATACTCGGATAAATTATGTTCCTTATTTTTCATGGTTTTCGTCATACCACAGTTTACCCCAGAATGAGAAAGGATCTCCTTTTGAAATAAAAAGTACTTAAGGGCAATGATATTGAGTTGCTAGACGTTTGGTTAGACGCCTGTTTTGAAATAAAAAAGCTGTCTCAAATTAATCGAGCAAGCACAGATCAAACAAGATACAAACAAAGCTTTTCAACGTAATATTTACTATCGATGACTATTTCTTCTGCACATCCAGAGACAGAACCAAAGTGGTGGAAAGAGGCCACGTTCTATCAAATTTACCCAGCAAGTTTCAAAGACTCTAATGACGATGGCTGGGGTGATATGAAGGGTATTTCCTCCAAGTTGGAGTATATCAAGGAGCTTGGTGTCGATGCCATTTGGATCTCACCATTCTACGACTCGCCACAAGATGATATGGGTTACGATATTGCCAACTACGAAAAGGTCTGGCCAACCTACGGTACGAACGAGGACTGCTTTGCCTTGATCGAAAAGACACATAAGCTCGGTATGAAATTTATCACTGACTTGGTCATCAATCACTGTTCCAGCGAACATGAATGGTTCAAAGAGAGCAGATCCTCAAAAACCAATCCAAAACGTGACTGGTTCTTCTGGAGACCTCCTAAGGGTTATGACGCCGAAGGCAAGCCAATTCCTCCAAACAATTGGAGGTCTTACTTCGGTGGTTCTGCATGGACGTTTGATGAAAAGACACAAGAGTTTTACTTGCGTTTGTTTTGCTCCACCCAACCTGATCTCAACTGGGAAAACGAAGACTGCAGAAAGGCAATCTACGAAAGCGCTGTTGGATACTGGTTAGACCACGGTGTTGACGGTTTCAGAATCGATGTGGGGAGTTTGTACTCCAAAGTTGTTGGTCTACCAGATGCTCCTGTGATTGACAAGAACACAAAATGGCAGGCCAGTGATGCTTTCACAATGAATGGACCACGTATTCATGAATTCCACCAAGAAATGAACAAGTTCATGAGAGACAGGGTCAAAGATGGAAGAGAAATCATGACGGTTGGTGAAATGCAACATGCTTCCGACGAGACCAAGAAGTTGTACACGAGTGCCTCAAGACACGAGCTTAGTGAGCTGTTCAACTTTTCTCACACTGATGTTGGAACTTCTCCCAAGTTCCGCCAAAACTTGGTCCCATTTGAATTGAAGGATTGGAAAGTTGCTCTTGCCGAGCTGTTCAGTTATGTTAATGGAACTGATTGTTGGTCGACTATTTACCTAGAAAATCATGACCAACCTCGTTCGATTACGAGATTTGGTGACGACTCGCCCAAGAATCGTGTCGTTTCTGGTAAGTTGCTGTCTGTATTGTTAGTGTCACTGACCGGTACTTTGTATGTGTACCAGGGACAGGAGCTGGGCCAAATCAATTTCAAAAACTGGCCTATTGAAAAGTACGAGGACGTTGAAATCAGAAACAACTACAAGGCGATCAAGGAAGAGCATGGGGAAAACTCGAGCGAGATGAAAAAGTTTTTAGAAGCCATTGCTCTCATCTCCAGAGACCATGCCAGAACACCTATGCAATGGTCGTGCGAAGAGCCAAATGCTGGCTTCTCTGGGCCCACCGGCAAGCCATGGTTCTACTTGAACGAATCTTTCAGAGAGGGAATCAATGTCGAAGACGAGCAAAAGGATCCAAACTCTGTGCTGGCTTTCTGGAAGGAAGCTTTGAGATTTAGAAAGGCACACAAGGATATTACTGTTTATGGCTATGATTTTGAGTTCATCGACCTGGACAATAAAAAGCTGTTTAGCTTTACCAAGAAGTACGAGAACAAGACTTTGTTTGCAGCCTTGAACTTTAGCTCTGATAACGTGGACTTTACGATTCCAGATAACAGTACCTCGTTCAAGTTGGAGTTTGGAAACTTCCCAAAGAAGGAGGTAGATGCCTCTTCCAGAACTTTGAAGCCATGGGAGGGTAGAATTTACGTATCTGAATGATTGACGATTGTGACAGATCGGAAGAGCAC

And the annotations indicate that the longest ORF is a determinate gene but in the same frame there is an ORF that has a stop codon and thus probably a truncated protein. Indeed the truncated protein has a lot of interest for me. 
The question is: How can I interpret these results, Does the second truncate ORF has significance or is it only an artifact?

Brian Haas

unread,
Feb 14, 2020, 8:23:53 PM2/14/20
to Cesar Hernandez, TransDecoder-users
There could be a frameshift in there. You might load up the sequence into IGV, align the reads to it, and examine the read support along the position of the potential frameshift.  If it was an assembly error, the read alignments should indicate this.

Attached is a screenshot of the ncbi longorfs output, just for reference.  (from https://www.ncbi.nlm.nih.gov/orffinder/)



--
You received this message because you are subscribed to the Google Groups "TransDecoder-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to transdecoder-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/transdecoder-users/0bccd6e4-d154-4730-97a5-39b4d3e2b3b3%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 
Screen Shot 2020-02-14 at 8.21.38 PM.png
Reply all
Reply to author
Forward
0 new messages