Incomplete longest possible ORF, why is it happening?

120 views
Skip to first unread message

Guilherme Gainett

unread,
Dec 2, 2019, 3:34:38 PM12/2/19
to TransDecoder-users
I have the following transcript:
>TRINITY_DN154468_c2_g2_i2
GTCAATGATTTCATAATATGTAATTGTTTAAATTATTTTTCAGTACAGGAGGATAGAAAGAGGACAATGCAACAGAACGAAGAGATTTGTTTTGAACTGACAAATGTGGAGACACAGAAGGACTTTTCGTAGACAGAGGAGCTGTAACTGTCAGCGGGGGTAAGCCGGGATACTTGCCGCCTCTCGGGTCATAACCTGGACGACGCCAACTGGCCTTTGTTGGGAAGAAATGCGACGGCCTACAGAAAAGAACAGGAAAACTGAGTCCTATCATGAAGCCTTTTATGATATGTCCCTCGTCTCCTGATGGAATACAGTAAATTTGAAAATAATGAAAAGGTTGTTGTTTTTTAAAGCCGTTTATTTCGGCGATAGAAGTTGTTATAGAAGTGAAACAGTGACTTCATGGCGAGAAAAACCTCTTTCCTAAGGTGATTCCAATGCCCACCACGAGATGGCGATAGTTGTAGTATTGCCCTCTGATGCGGCTGAACGAAGAAAACATTTCCTCTAACCCAGTCACTCGGTGTTTCACATGATCTTGAAACAACTGGCGACCGCTTGCTGCATTCCTCATGGCATCTACGTGTGAAGGCCAAGGAAGAATATCGCAAGGATAATTTAACATCAATGAAACTTCAATAATAACCTCGGAAGCAGCAGACTGGAGAAAAATGATGCCTTGAAAACAAAGTGCGAACGGCCTGAGGCGAAGGGGTCGACAGACGTACACTCGCTACCAAACCCTGGAGCTGGAGAAGGAATTCCACACCAACCACTACTTGACCCGACGGCGTAGGATCGAGATGGCCCACGCTCTGTGTCTCACTGAACGGCAGATCAAAATCTGGTTCCAGAACCGGCGGATGAAGCTGAAGAAGGAGATCCAGGCCATCAAGGAGCTGAACGAGCAGGAGCGCCAAGCTCAAGCAGCCAAAATCAACACTAGCCAGAACCAGACGCAAACTCAGAGCCAACCGCCGTCGCAGCCGGCCACCAAGACCCAGCAACCGTCGTCCTCTTCGGCGACCCCCGCCAACGAGTCAGCGGCGGGGAAAACGTAAGGTTGATCTTCTTCTTCCTCTCTTCCCGTGTCCCAGAGGACTTCCAGAGGTCAGAGGTCGCAGGGTCATGATCGGTGTGCGAGGAGGTCTTTCCTTAATGGTCTCAACCAAAACATTTTACCCACAGATGCAGCGGGGATTATATATACATTTAGCTTTAGTTCCTCAAGAATGTACCGATACTCTGTGGGTTCCACTTCCCCTTTGAATCGATCCAAATATCAGGATTTGACGTCACGGCGGTCCACCCCTATTTTCACTAATTTACCTTTAATTCAAAAATGAATAATGTTTGACATAATCTATTATGAAAGTTGTTGACAGATGTCTGTACAGTCGTGATCTCTTCCCTAATCTCAAAACCTGTCAAAACTTGTAACGTTTCATAGTTGAAGAGAAAAACTTTAGAGAAACTTAATCTGGAAGCTACCTTTAGTTCGTGTTCTTTCTCCTCCTATTAGTATTATTTTGGAAAGATTGGAAACTCAGGCAAAAGAAGTGTCGAAGGAAAAGTGCAAACTTTTACTCTTAATAAAATGGTGTAATTAAATTACAGCTGTTTAGCTCTGTGGTTGTCCAACTTCAGCTTGAACAAATCTGTTTGAAATATGACGGACTTCCGCAAGATGTAGAAAATAGAAAAACACGCTAGGAACCTCAAAACTCGATTTAGGAGCTCAGAAACGAAAGATTATACAGGGTTACTATATAAAAAAAAAATAGATACTTTCAACGTTTTATAGAAACGAAACCATTACTGATATAGTCAATACTTACATGGTTTTTTAGTATATGTCTTTAAGTTCGTGTGACTAAATCACAAATGTTGTATGTGTGAGTCTTTCGTCAGGGCTGCACATGTAGGTCTTACTAACACGAGCCACTTGCTCCTCTGATACAGGTGGTCTCCCATAACTTTTCCCCTTGGCG

This is the Hox gene Ubx in the species I work with.
In TransDecoder I can find part of the correct ORF by using a -m 80 threshold. 
This is what I get:
>TRINITY_DN154468_c2_g2_i2.p2 type:complete len:86 gc:universal TRINITY_DN154468_c2_g2_i2:807-1064(+)
MAHALCLTERQIKIWFQNRRMKLKKEIQAIKELNEQERQAQAAKINTSQNQTQTQSQPPSQPATKTQQPSSSSATPANESAAGKT*

It is a partial protein sequence, but this is fine considering it is from a de novo transcriptome assembly.
Nonetheless, if I manually choose a ORF or search in NCBI ORF finder using any sense codon, I can retrieve a longer partial ORF, which is correct.
This is a Hox gene and the remaining piece is the rest of the homeodomain.

I added spaces here in the sequence just to make it clear
>lcl|ORF24
MQSANGLRRRGRQTYTRYQTLELEKEFHTNHYLTRRRRIE           MAHALCLTER
QIKIWFQNRRMKLKKEIQAIKELNEQERQAQAAKINTSQNQTQTQSQPPS
QPATKTQQPSSSSATPANESAAGKT

This additional 5' fragment is real, as it contains a very conserved part of the homeodomain (ELEKEF)

Could you please help me identify why TransDecoder will not output that longer partial ORF?

Thank you very much for your time!
Best,
Guilherme

Brian Haas

unread,
Dec 2, 2019, 4:45:09 PM12/2/19
to Guilherme Gainett, TransDecoder-users
Hi,

The start code refinement operation are probably relocating the start codon to one that looks like it has a better sequence context.  While this generally improves things globally, it apparently makes the wrong choice here.  You can turn off the start refinement - there should be an option like --no_start_refinement  in case that helps.

best,

~b

--
You received this message because you are subscribed to the Google Groups "TransDecoder-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to transdecoder-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/transdecoder-users/42924750-1045-4039-9c59-cbf416886588%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Brian Haas

unread,
Dec 2, 2019, 5:52:23 PM12/2/19
to Guilherme Gainett, TransDecoder-users
Hi Guilherme,

I looked into it further using https://www.ncbi.nlm.nih.gov/orffinder/

It turns out that the orf you've selected is disrupted, as there's an in-frame stop before the alternative start codon that you prefer.  Attached is a screenshot showing this.

If you're curious whether this is likely to be due to an assembly artifact, the best thing to do is to align the reads back to your Trinity assembly and examine the read coverage and alignments using IGV.

best,

~brian



best,

~brian
Screen Shot 2019-12-02 at 5.48.37 PM.png

Guilherme Gainett

unread,
Dec 2, 2019, 5:57:21 PM12/2/19
to Brian Haas, TransDecoder-users
Dear Brian,

That makes a lot of sense!
Thank you very much for your time and I will follow your suggestion.
Best,
Guilherme
Reply all
Reply to author
Forward
0 new messages