Hello,
I used TransDecoder on usegalaxy and it resulted in 5 output files two of which are longest ORFs (PEP/FASTA) and Results (PEP/FASTA).
When I look at the results of the same transcript from the abovementioned files I see different ORFs. As an example, I randomly copy the ORF results of the same transcript from two files below.
- Why are they different and what is the difference? I found the following definitions but I'm still not clear as to why the sequences (PEP/FASTA) from the two files are different.
- longest ORFs (PEP/FASTA): all ORFs meeting the minimum length criteria, regardless of coding potential
- Results (PEP/FASTA): peptide sequences for the final candidate ORFs; all shorter candidates within longer ORFs were removed
- How is coding potential defined?
- Why is the result "longest ORFs (PEP)" chosen but not Results (PEP/FASTA) to identify ORFs with homology to known proteins?
- The result "longest ORFs (PEP)" can be used to identify ORFs with homology to known proteins via BlastP or Pfam searches.
Results (PEP/FASTA):
>MSTRG.10001.1.p2 GENE.MSTRG.10001.1~~MSTRG.10001.1.p2 ORF type:complete len:150 (-),score=27.33 MSTRG.10001.1:588-1037(-)
MQNARLDEVQAGIKITRRNINNLRYAHDTTIMAESKEELKSLLMNVKEESEKVGLKLNIQ
KTKIMASSPITSWQIDGVTMETVRNFIFLGSKIPADGDCSHEIKKCLLLGRKATTNLDSI
LKSRNITLPTEVCLVKAMIFPVVMYGCES*
>MSTRG.10001.1.p1 GENE.MSTRG.10001.1~~MSTRG.10001.1.p1 ORF type:complete len:295 (-),score=54.53 MSTRG.10001.1:2992-3876(-)
MGLVVIGGTQRGVGRGIGRIWSKSTNFQSTTQPGEAGRPSPPGLCPRPRSRRSPGAETPS
ARLPPAHRSREPGSRCPPAGPCPGAPSRSRARWRHALLQAIRALQKGTRLLLRKSPFCRL
QLSHPTSLPSLPLPRETVSSSPTAETRRAVPVVSKLPAADTSHQLGPKSKHQERRTEVKI
HGDSRRIYLLLSLLFLSTPPPIPAPSPAREIRGKFTRGVDFSWQTQALLALQEAAELSSK
DAQLARRIRGIQGHGRAPGTRPCLWWIPEPLPVARTRSRGYKVLPALAVSEQSV*
Results Longest ORFs (PEP/FASTA):
>MSTRG.1001.1.p1 type:3prime_partial len:494 gc:Universal MSTRG.1001.1:901-2379(+)
MLQSSPARAVLRGREPASCEGLCGRGAGADGGGGGDGYGSLRPGWPVARGQGAPAEDDGEDVRGVLKRRVETRQHTEEAVRQQEVEQLDFRDLLGKKVSTKTLSEEDLKEIPAEQLDFRDLLGKKVSTKTLSEEDLKEIPAEQMDFRANLQRQVKPKTLSEEERKVHGPQQVDFRSVLAKKGTPKTPVPEKVPPPKPATPDFRSVLGSKKKLPTENGSNNTEALNAKAAEGLKPVGNAQPSGFLKPVGNAKLADTPKPLSSTKPAETPKPLGNVKPAETPKPLGSTKPAETPKPLGSTKPAETPKPLGNVKPAETPKPLGNIKPTETPKPLGSTKPAETPKPLGSTKPAETPKPLGNVKPAETPKPLGNVKPAETPKPLGNVKPAETPKPVSNAKPAETLKPVGNAKPAETPKPLSNVKPAETPKLVGNAKPAETSKPLDNAKPAEAPKPLGNAKPAEIPKPTGKEELKKEIKNDVNCKKGHAGATDSEKRPE
>MSTRG.1001.1.p2 type:5prime_partial len:456 gc:Universal MSTRG.1001.1:2378-1011(-)
SGLFSLSVAPAWPFLQFTSFLISFLSSSFPVGFGISAGLALPKGLGASAGLALSKGLEVSAGLALPTSLGVSAGLTLLKGLGVSAGLALPTGLRVSAGLALLTGLGVSAGLTLPKGLGVSAGLTLPKGLGVSAGLTLPKGFGVSAGLVLPKGLGVSAGLVLPKGLGVSVGLMLPKGLGVSAGLTLPKGFGVSAGLVLPKGLGVSAGLVLPKGLGVSAGLTLPKGLGVSAGLVLLKGLGVSASLALPTGFRNPEGWALPTGLRPSAALAFKASVLLLPFSVGNFFLLPNTERKSGVAGFGGGTFSGTGVLGVPFLARTERKSTCWGPCTFLSSSDRVFGFTCRCRLARKSICSAGISFRSSSDKVFVLTFLPRRSRKSSCSAGISFRSSSDKVLVLTFLPRRSRKSSCSTSCWRTASSVCCRVSTRLLSTPRTSSPSSSAGAPCPLATGQPGLRLP*
>MSTRG.1001.1.p3 type:5prime_partial len:253 gc:Universal MSTRG.1001.1:2379-1621(-)
LWSLFTICGPCMALLAVHIILNLLLKFFFPSGFWDFSRLGIAQGFGGLSWLGVIQGFRGLSWLGIAHKFGGFSWLDVTQGFGGLSWLGVAHRFEGFGWLGIAHRFRGLSWLDITQGFGGLSWLDITQGFGGLGWLDITQGFWGLCWLGAAQGFGGFCWLGAAQGFGGLSWLDVTQGFGGLSWLDVTQGFWGLCWLGAAQGFGGFCWLGAAQGFGGLSWLDITQGFGGLCWLGAAQGFGGISQLGVAHGFQEP*
>MSTRG.1001.1.p4 type:complete len:119 gc:Universal MSTRG.1001.1:1718-2074(+)
MSSQLRPPNPWAAPSQQKPPNPWAAPSQQRPQNPWVTSSQLRPPNPWVTSSQLRPPNPWAAPSQQKPPNPWAAPSQQRPQNPWVMSSQPRPPNPWVMSSQLRPPNPWVMSSQLRPLNL*
Thank you.
Best,
Baris