I have the following transcript:
>TRINITY_DN154468_c2_g2_i2
GTCAATGATTTCATAATATGTAATTGTTTAAATTATTTTTCAGTACAGGAGGATAGAAAGAGGACAATGCAACAGAACGAAGAGATTTGTTTTGAACTGACAAATGTGGAGACACAGAAGGACTTTTCGTAGACAGAGGAGCTGTAACTGTCAGCGGGGGTAAGCCGGGATACTTGCCGCCTCTCGGGTCATAACCTGGACGACGCCAACTGGCCTTTGTTGGGAAGAAATGCGACGGCCTACAGAAAAGAACAGGAAAACTGAGTCCTATCATGAAGCCTTTTATGATATGTCCCTCGTCTCCTGATGGAATACAGTAAATTTGAAAATAATGAAAAGGTTGTTGTTTTTTAAAGCCGTTTATTTCGGCGATAGAAGTTGTTATAGAAGTGAAACAGTGACTTCATGGCGAGAAAAACCTCTTTCCTAAGGTGATTCCAATGCCCACCACGAGATGGCGATAGTTGTAGTATTGCCCTCTGATGCGGCTGAACGAAGAAAACATTTCCTCTAACCCAGTCACTCGGTGTTTCACATGATCTTGAAACAACTGGCGACCGCTTGCTGCATTCCTCATGGCATCTACGTGTGAAGGCCAAGGAAGAATATCGCAAGGATAATTTAACATCAATGAAACTTCAATAATAACCTCGGAAGCAGCAGACTGGAGAAAAATGATGCCTTGAAAACAAAGTGCGAACGGCCTGAGGCGAAGGGGTCGACAGACGTACACTCGCTACCAAACCCTGGAGCTGGAGAAGGAATTCCACACCAACCACTACTTGACCCGACGGCGTAGGATCGAGATGGCCCACGCTCTGTGTCTCACTGAACGGCAGATCAAAATCTGGTTCCAGAACCGGCGGATGAAGCTGAAGAAGGAGATCCAGGCCATCAAGGAGCTGAACGAGCAGGAGCGCCAAGCTCAAGCAGCCAAAATCAACACTAGCCAGAACCAGACGCAAACTCAGAGCCAACCGCCGTCGCAGCCGGCCACCAAGACCCAGCAACCGTCGTCCTCTTCGGCGACCCCCGCCAACGAGTCAGCGGCGGGGAAAACGTAAGGTTGATCTTCTTCTTCCTCTCTTCCCGTGTCCCAGAGGACTTCCAGAGGTCAGAGGTCGCAGGGTCATGATCGGTGTGCGAGGAGGTCTTTCCTTAATGGTCTCAACCAAAACATTTTACCCACAGATGCAGCGGGGATTATATATACATTTAGCTTTAGTTCCTCAAGAATGTACCGATACTCTGTGGGTTCCACTTCCCCTTTGAATCGATCCAAATATCAGGATTTGACGTCACGGCGGTCCACCCCTATTTTCACTAATTTACCTTTAATTCAAAAATGAATAATGTTTGACATAATCTATTATGAAAGTTGTTGACAGATGTCTGTACAGTCGTGATCTCTTCCCTAATCTCAAAACCTGTCAAAACTTGTAACGTTTCATAGTTGAAGAGAAAAACTTTAGAGAAACTTAATCTGGAAGCTACCTTTAGTTCGTGTTCTTTCTCCTCCTATTAGTATTATTTTGGAAAGATTGGAAACTCAGGCAAAAGAAGTGTCGAAGGAAAAGTGCAAACTTTTACTCTTAATAAAATGGTGTAATTAAATTACAGCTGTTTAGCTCTGTGGTTGTCCAACTTCAGCTTGAACAAATCTGTTTGAAATATGACGGACTTCCGCAAGATGTAGAAAATAGAAAAACACGCTAGGAACCTCAAAACTCGATTTAGGAGCTCAGAAACGAAAGATTATACAGGGTTACTATATAAAAAAAAAATAGATACTTTCAACGTTTTATAGAAACGAAACCATTACTGATATAGTCAATACTTACATGGTTTTTTAGTATATGTCTTTAAGTTCGTGTGACTAAATCACAAATGTTGTATGTGTGAGTCTTTCGTCAGGGCTGCACATGTAGGTCTTACTAACACGAGCCACTTGCTCCTCTGATACAGGTGGTCTCCCATAACTTTTCCCCTTGGCG
This is the Hox gene Ubx in the species I work with.
In TransDecoder I can find part of the correct ORF by using a -m 80 threshold.
This is what I get:
>TRINITY_DN154468_c2_g2_i2.p2 type:complete len:86 gc:universal TRINITY_DN154468_c2_g2_i2:807-1064(+)
MAHALCLTERQIKIWFQNRRMKLKKEIQAIKELNEQERQAQAAKINTSQNQTQTQSQPPSQPATKTQQPSSSSATPANESAAGKT*
It is a partial protein sequence, but this is fine considering it is from a de novo transcriptome assembly.
Nonetheless, if I manually choose a ORF or search in NCBI ORF finder using any sense codon, I can retrieve a longer partial ORF, which is correct.
This is a Hox gene and the remaining piece is the rest of the homeodomain.
I added spaces here in the sequence just to make it clear
>lcl|ORF24
MQSANGLRRRGRQTYTRYQTLELEKEFHTNHYLTRRRRIE MAHALCLTER
QIKIWFQNRRMKLKKEIQAIKELNEQERQAQAAKINTSQNQTQTQSQPPS
QPATKTQQPSSSSATPANESAAGKT
This additional 5' fragment is real, as it contains a very conserved part of the homeodomain (ELEKEF)
Could you please help me identify why TransDecoder will not output that longer partial ORF?
Thank you very much for your time!
Best,
Guilherme