EVM does not produce the consensus gene structure of a gene with very clear hints

125 views
Skip to first unread message

Jon Lerga Jaso

unread,
Mar 12, 2019, 6:28:18 AM3/12/19
to EVidenceModeler-users
Hello,

I am trying to annotate the genome of a Drosophila species. I run three ab initio gene predictors (Augustus, geneID, SNAP), two protein homology-predictors (exonerate, geneblastg) and reconstructed transcripts from RNA-seq with PASA.
The formats for the input files are correct. However, there are very clear evidences for a specific gene (from exonerate and RNA-Seq), but EVM does not give any output structure. I thought that this was happening because any place where ab initio predictors do not predict a gene, it's contributing a 'not a gene' type of score. And, in this case, ab initio predictors does not predict any gene in this place. But exonerate and rna seq hints are quite clear to me. However, if I exclude the ab initio predictors or I give a lot of weight to exonerate and PASA (=100000000000), it still does not work.

I send you the files from the scaffold I am talking about. The gene should be predicted between coordinates 48614 and 49031.
As you can see in protein_alignments.gff3, alignments "ID=geneblastG.33300;Target=FBpp0088783", "ID=geneblastG.33301;Target=FBpp0309291",
"ID=exonerate.53823;Target=FBpp0309291", "ID=exonerate.53824;Target=FBpp0088783", or PASA transcripts "ID=align_395915;Target=asmbl_12405", the consensus structure should be clear.

Any ideas why EVM does not give me any consensus structure here?

Thanks,
Ion


D.buzzatii_Freeze1_Scaffolds.fa
gene_predictions.gff3
protein_alignments.gff3
transcript_alignments.gff3

Brian Haas

unread,
Mar 12, 2019, 6:57:07 AM3/12/19
to Jon Lerga Jaso, EVidenceModeler-users
Hi Jon,

EVM won't model complete genes based on transcript and protein alignments alone.  These provide sources for internal exon types, but not initial or terminal exons, which are derived from ab initios.  If you run TransDecoder on the PASA data, you could include the TransDecoder predictions as an 'OTHER_PREDICTION' class, and this would be a way to integrate transcript-based or transcript-only complete gene structures at loci where there are no ab initios.

best,

~b

--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-...@googlegroups.com.
To post to this group, send email to evidencemo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidencemodeler-users/b1344778-64ab-4bc7-a551-ed01b773d8ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Jon Lerga Jaso

unread,
Mar 12, 2019, 7:16:32 AM3/12/19
to EVidenceModeler-users
Excellent, I will try that.
Many thanks!


El martes, 12 de marzo de 2019, 11:57:07 (UTC+1), Brian Haas escribió:
Hi Jon,

EVM won't model complete genes based on transcript and protein alignments alone.  These provide sources for internal exon types, but not initial or terminal exons, which are derived from ab initios.  If you run TransDecoder on the PASA data, you could include the TransDecoder predictions as an 'OTHER_PREDICTION' class, and this would be a way to integrate transcript-based or transcript-only complete gene structures at loci where there are no ab initios.

best,

~b

On Tue, Mar 12, 2019 at 6:28 AM Jon Lerga Jaso <jlerg...@gmail.com> wrote:
Hello,

I am trying to annotate the genome of a Drosophila species. I run three ab initio gene predictors (Augustus, geneID, SNAP), two protein homology-predictors (exonerate, geneblastg) and reconstructed transcripts from RNA-seq with PASA.
The formats for the input files are correct. However, there are very clear evidences for a specific gene (from exonerate and RNA-Seq), but EVM does not give any output structure. I thought that this was happening because any place where ab initio predictors do not predict a gene, it's contributing a 'not a gene' type of score. And, in this case, ab initio predictors does not predict any gene in this place. But exonerate and rna seq hints are quite clear to me. However, if I exclude the ab initio predictors or I give a lot of weight to exonerate and PASA (=100000000000), it still does not work.

I send you the files from the scaffold I am talking about. The gene should be predicted between coordinates 48614 and 49031.
As you can see in protein_alignments.gff3, alignments "ID=geneblastG.33300;Target=FBpp0088783", "ID=geneblastG.33301;Target=FBpp0309291",
"ID=exonerate.53823;Target=FBpp0309291", "ID=exonerate.53824;Target=FBpp0088783", or PASA transcripts "ID=align_395915;Target=asmbl_12405", the consensus structure should be clear.

Any ideas why EVM does not give me any consensus structure here?

Thanks,
Ion


--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-users+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages