Is this protein sequence in the output from EVidenceModeler wrong?

50 views
Skip to first unread message

John Martinson

unread,
Jul 3, 2020, 2:22:56 PM7/3/20
to EVidenceModeler-users
Brian,

Highlighted is some output from EVidenceModeler (actually it's output from PASA after running my EVM output models back through PASA).


# ORIGINAL: evm.model.scaf3.106 original gene structure, not modified by PASA
scaf3   EVM     gene    3210655 3210974 .       +       .       ID=evm.TU.scaf3.106;Name=EVM%20prediction%20scaf3.106
scaf3   EVM     mRNA    3210655 3210974 .       +       .       ID=evm.model.scaf3.106;Parent=evm.TU.scaf3.106;Name=EVM%20prediction%20scaf3.106
scaf3   EVM     exon    3210655 3210974 .       +       .       ID=evm.model.scaf3.106.exon1;Parent=evm.model.scaf3.106
scaf3   EVM     CDS     3210655 3210974 .       +       2       ID=cds.evm.model.scaf3.106;Parent=evm.model.scaf3.106


#PROT evm.model.scaf3.106 evm.TU.scaf3.106      REEVRVFRVFEEIHAQRSPGQAHKDAPEQEGRRRSRGVQCGRSHVLGQHHHGGRHHAHPHQHPGRLHAGPRHRQRHRQHLQLAGPAGLSRDPAAVSVRVHRGRGRV

From what I can tell the protein sequence above is what you would get if the phase of the CDS line was "1", and not "2". So, is the protein sequence above possibly incorrect?

The corresponding transcript that goes with this is:

>evm.model.scaf3.106
GAGAGAAGAAGTTCGTGTGTTCCGAGTGTTCGAAGAGATTCATGCGCAGCGATCACCTGGCCAAGCACATAAAGACGCACCAGAACAAGAAGGGAGGAGGCGGAGCCGTGGTGTCCAGTGTGGGCGGAGCCATGTCCTCGGACAGCATCATCACGGCGGGCGGCACCACGCTCATCCTCACCAACATCCAGGCCGGCTCCATGCAGGGCCTCGCCACCGTCAACGCCACCGTCAACACCTCCAGCTCGCAGGACCAGCTGGGCTCAGCCGAGATCCCGCTGCAGTTAGTGTCCGTGTCCACCGGGGACGGGGTCGAGTGA

Thanks,
John
 

Brian Haas

unread,
Jul 3, 2020, 5:59:53 PM7/3/20
to John Martinson, EVidenceModeler-users
Hi John,

It's probably wrong.  In the case where there's multiple uninterrupted translations, it's probably taking the first one that's uninterrupted and assuming that's the correct one.  In this case, that logic fails. You can log it as a bug on the pasa github issues, or I can.

best,

~b

--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidencemodeler-users/97b753a7-354e-44bf-8cf3-9afb49b6250bo%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

John Martinson

unread,
Jul 7, 2020, 1:05:57 PM7/7/20
to EVidenceModeler-users
Brian,

I guess I'll let you log the bug. Thanks for the timely response.

Do you think there is a quick and dirty way to screen for this type of error in the output? 

John

 
It's probably wrong.  In the case where there's multiple uninterrupted translations, it's probably taking the first one that's uninterrupted and assuming that's the correct one.  In this case, that logic fails. You can log it as a bug on the pasa github issues, or I can.

best,

~b

On Fri, Jul 3, 2020 at 2:22 PM 'John Martinson' via EVidenceModeler-users <evidencemo...@googlegroups.com> wrote:
Brian,

Highlighted is some output from EVidenceModeler (actually it's output from PASA after running my EVM output models back through PASA).


# ORIGINAL: evm.model.scaf3.106 original gene structure, not modified by PASA
scaf3   EVM     gene    3210655 3210974 .       +       .       ID=evm.TU.scaf3.106;Name=EVM%20prediction%20scaf3.106
scaf3   EVM     mRNA    3210655 3210974 .       +       .       ID=evm.model.scaf3.106;Parent=evm.TU.scaf3.106;Name=EVM%20prediction%20scaf3.106
scaf3   EVM     exon    3210655 3210974 .       +       .       ID=evm.model.scaf3.106.exon1;Parent=evm.model.scaf3.106
scaf3   EVM     CDS     3210655 3210974 .       +       2       ID=cds.evm.model.scaf3.106;Parent=evm.model.scaf3.106


#PROT evm.model.scaf3.106 evm.TU.scaf3.106      REEVRVFRVFEEIHAQRSPGQAHKDAPEQEGRRRSRGVQCGRSHVLGQHHHGGRHHAHPHQHPGRLHAGPRHRQRHRQHLQLAGPAGLSRDPAAVSVRVHRGRGRV

From what I can tell the protein sequence above is what you would get if the phase of the CDS line was "1", and not "2". So, is the protein sequence above possibly incorrect?

The corresponding transcript that goes with this is:

>evm.model.scaf3.106
GAGAGAAGAAGTTCGTGTGTTCCGAGTGTTCGAAGAGATTCATGCGCAGCGATCACCTGGCCAAGCACATAAAGACGCACCAGAACAAGAAGGGAGGAGGCGGAGCCGTGGTGTCCAGTGTGGGCGGAGCCATGTCCTCGGACAGCATCATCACGGCGGGCGGCACCACGCTCATCCTCACCAACATCCAGGCCGGCTCCATGCAGGGCCTCGCCACCGTCAACGCCACCGTCAACACCTCCAGCTCGCAGGACCAGCTGGGCTCAGCCGAGATCCCGCTGCAGTTAGTGTCCGTGTCCACCGGGGACGGGGTCGAGTGA

Thanks,
John
 

--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-users+unsub...@googlegroups.com.

Brian Haas

unread,
Jul 7, 2020, 1:21:43 PM7/7/20
to John Martinson, EVidenceModeler-users
Sure, I'll log it.

You can just retranslate it given the outputted gff3 file using a tool that's more careful about the codon phase assignments.  I'll aim to tackle this for the next release.

To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-...@googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidencemodeler-users/f930b578-256b-45b9-a305-85a8c7f56485o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages