Difference between "Results (PEP/FASTA)" and "Results Longest ORFs (PEP/FASTA)"

148 views
Skip to first unread message

Barış Yaşar

unread,
Oct 7, 2021, 4:45:05 AM10/7/21
to TransDecoder-users
Hello,

I used TransDecoder on usegalaxy and it resulted in 5 output files two of which are longest ORFs (PEP/FASTA) and Results (PEP/FASTA).

When I look at the results of the same transcript from the abovementioned files I see different ORFs. As an example, I randomly copy the ORF results of the same transcript from two files below. 

  1. Why are they different and what is the difference? I found the following definitions but I'm still not clear as to why the sequences (PEP/FASTA) from the two files are different.
  • longest ORFs (PEP/FASTA): all ORFs meeting the minimum length criteria, regardless of coding potential
  • Results (PEP/FASTA): peptide sequences for the final candidate ORFs; all shorter candidates within longer ORFs were removed
  1.  How is coding potential defined?
  1. Why is the result "longest ORFs (PEP)" chosen but not Results (PEP/FASTA) to identify ORFs with homology to known proteins?
  • The result "longest ORFs (PEP)" can be used to identify ORFs with homology to known proteins via BlastP or Pfam searches.

Results (PEP/FASTA):
>MSTRG.10001.1.p2 GENE.MSTRG.10001.1~~MSTRG.10001.1.p2 ORF type:complete len:150 (-),score=27.33 MSTRG.10001.1:588-1037(-) MQNARLDEVQAGIKITRRNINNLRYAHDTTIMAESKEELKSLLMNVKEESEKVGLKLNIQ KTKIMASSPITSWQIDGVTMETVRNFIFLGSKIPADGDCSHEIKKCLLLGRKATTNLDSI LKSRNITLPTEVCLVKAMIFPVVMYGCES* 
>MSTRG.10001.1.p1 GENE.MSTRG.10001.1~~MSTRG.10001.1.p1 ORF type:complete len:295 (-),score=54.53 MSTRG.10001.1:2992-3876(-) MGLVVIGGTQRGVGRGIGRIWSKSTNFQSTTQPGEAGRPSPPGLCPRPRSRRSPGAETPS ARLPPAHRSREPGSRCPPAGPCPGAPSRSRARWRHALLQAIRALQKGTRLLLRKSPFCRL QLSHPTSLPSLPLPRETVSSSPTAETRRAVPVVSKLPAADTSHQLGPKSKHQERRTEVKI HGDSRRIYLLLSLLFLSTPPPIPAPSPAREIRGKFTRGVDFSWQTQALLALQEAAELSSK DAQLARRIRGIQGHGRAPGTRPCLWWIPEPLPVARTRSRGYKVLPALAVSEQSV*

Results Longest ORFs (PEP/FASTA):
>MSTRG.1001.1.p1 type:3prime_partial len:494 gc:Universal MSTRG.1001.1:901-2379(+) MLQSSPARAVLRGREPASCEGLCGRGAGADGGGGGDGYGSLRPGWPVARGQGAPAEDDGEDVRGVLKRRVETRQHTEEAVRQQEVEQLDFRDLLGKKVSTKTLSEEDLKEIPAEQLDFRDLLGKKVSTKTLSEEDLKEIPAEQMDFRANLQRQVKPKTLSEEERKVHGPQQVDFRSVLAKKGTPKTPVPEKVPPPKPATPDFRSVLGSKKKLPTENGSNNTEALNAKAAEGLKPVGNAQPSGFLKPVGNAKLADTPKPLSSTKPAETPKPLGNVKPAETPKPLGSTKPAETPKPLGSTKPAETPKPLGNVKPAETPKPLGNIKPTETPKPLGSTKPAETPKPLGSTKPAETPKPLGNVKPAETPKPLGNVKPAETPKPLGNVKPAETPKPVSNAKPAETLKPVGNAKPAETPKPLSNVKPAETPKLVGNAKPAETSKPLDNAKPAEAPKPLGNAKPAEIPKPTGKEELKKEIKNDVNCKKGHAGATDSEKRPE >MSTRG.1001.1.p2 type:5prime_partial len:456 gc:Universal MSTRG.1001.1:2378-1011(-) SGLFSLSVAPAWPFLQFTSFLISFLSSSFPVGFGISAGLALPKGLGASAGLALSKGLEVSAGLALPTSLGVSAGLTLLKGLGVSAGLALPTGLRVSAGLALLTGLGVSAGLTLPKGLGVSAGLTLPKGLGVSAGLTLPKGFGVSAGLVLPKGLGVSAGLVLPKGLGVSVGLMLPKGLGVSAGLTLPKGFGVSAGLVLPKGLGVSAGLVLPKGLGVSAGLTLPKGLGVSAGLVLLKGLGVSASLALPTGFRNPEGWALPTGLRPSAALAFKASVLLLPFSVGNFFLLPNTERKSGVAGFGGGTFSGTGVLGVPFLARTERKSTCWGPCTFLSSSDRVFGFTCRCRLARKSICSAGISFRSSSDKVFVLTFLPRRSRKSSCSAGISFRSSSDKVLVLTFLPRRSRKSSCSTSCWRTASSVCCRVSTRLLSTPRTSSPSSSAGAPCPLATGQPGLRLP* 
>MSTRG.1001.1.p3 type:5prime_partial len:253 gc:Universal MSTRG.1001.1:2379-1621(-) LWSLFTICGPCMALLAVHIILNLLLKFFFPSGFWDFSRLGIAQGFGGLSWLGVIQGFRGLSWLGIAHKFGGFSWLDVTQGFGGLSWLGVAHRFEGFGWLGIAHRFRGLSWLDITQGFGGLSWLDITQGFGGLGWLDITQGFWGLCWLGAAQGFGGFCWLGAAQGFGGLSWLDVTQGFGGLSWLDVTQGFWGLCWLGAAQGFGGFCWLGAAQGFGGLSWLDITQGFGGLCWLGAAQGFGGISQLGVAHGFQEP* 
>MSTRG.1001.1.p4 type:complete len:119 gc:Universal MSTRG.1001.1:1718-2074(+) MSSQLRPPNPWAAPSQQKPPNPWAAPSQQRPQNPWVTSSQLRPPNPWVTSSQLRPPNPWAAPSQQKPPNPWAAPSQQRPQNPWVMSSQPRPPNPWVMSSQLRPPNPWVMSSQLRPLNL*

Thank you.

Best,
Baris

Reply all
Reply to author
Forward
0 new messages