Troubleshooting tandem fused genes

94 views
Skip to first unread message

falinor181

unread,
Oct 10, 2017, 4:44:53 PM10/10/17
to EVidenceModeler-users
Hi there,

I'm troubleshooting a gene prediction issue in evidence modeler where two nearby homologous genes are getting fused into one extra long gene with a long intronic spacer in between.  Weights are as follows:
PROTEIN protein2genome  4
TRANSCRIPT blat-Ppyr1.3_Pasa_v1 5
TRANSCRIPT gmap-Ppyr1.3_Pasa_v1 5
OTHER_PREDICTION        maker   1
ABINITIO_PREDICTION transdecoder 20

The "transdecoder" features are derived from the PASA pipeline.  Overall, my logic for these weights are to strongly rely on the transcript derived gene models, while only relying on the ab-initio a little to help capture some more genes / decrease fragmented genes.

I've found for this particular locus, the transdecoder GFF seems to have captured the two loci properly.  I think the issue may be the protein/transcript evidence I am providing.  Does EVM chain together the transcript/protein evidence based on the "Target" attribute of the GFF features?  I'm noticing that some of my protein evidence via exonerate spans both genes.  Although,I have filtered out those "protein_match" features that span both genes, I've noticed that the "match_part" features on either gene do have the same "Target" attribute...

All the best,
-Tim

Brian Haas

unread,
Oct 10, 2017, 8:03:04 PM10/10/17
to falinor181, EVidenceModeler-users
Hi Tim,

I think it does the grouping of features based on the Target value and so it thinks there's a long intron there rather than it being intergenic space.

best,

~b

--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-users+unsub...@googlegroups.com.
To post to this group, send email to evidencemodeler-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/evidencemodeler-users/9d6115fb-7368-4fb1-bef7-bd808d9bd8ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

falinor181

unread,
Oct 11, 2017, 9:21:31 AM10/11/17
to EVidenceModeler-users
Hi Brian,

I've removed the protein evidence, and I'm still ending up with the fused genes.  Best I can tell, the only evidence supporting the fusion is a single gene prediction track from SNAP (via maker, so it is included in the "OTHER_PREDICTION" weighting)

Do you have a sense for if the EVM scoring function would strongly weigh a longer gene / longer produced protein?  It seems odd to me that the longer maker gene track (with weight 1), can win out against 2 shorter transdecoder derived gene tracks (with weight 20).  The maker gene track even seems to remove an exon from one of the genes, which I expect would not be supported.

All the best,
-Tim


On Tuesday, October 10, 2017 at 8:03:04 PM UTC-4, Brian Haas wrote:
Hi Tim,

I think it does the grouping of features based on the Target value and so it thinks there's a long intron there rather than it being intergenic space.

best,

~b
On Tue, Oct 10, 2017 at 4:44 PM, falinor181 <falin...@gmail.com> wrote:
Hi there,

I'm troubleshooting a gene prediction issue in evidence modeler where two nearby homologous genes are getting fused into one extra long gene with a long intronic spacer in between.  Weights are as follows:
PROTEIN protein2genome  4
TRANSCRIPT blat-Ppyr1.3_Pasa_v1 5
TRANSCRIPT gmap-Ppyr1.3_Pasa_v1 5
OTHER_PREDICTION        maker   1
ABINITIO_PREDICTION transdecoder 20

The "transdecoder" features are derived from the PASA pipeline.  Overall, my logic for these weights are to strongly rely on the transcript derived gene models, while only relying on the ab-initio a little to help capture some more genes / decrease fragmented genes.

I've found for this particular locus, the transdecoder GFF seems to have captured the two loci properly.  I think the issue may be the protein/transcript evidence I am providing.  Does EVM chain together the transcript/protein evidence based on the "Target" attribute of the GFF features?  I'm noticing that some of my protein evidence via exonerate spans both genes.  Although,I have filtered out those "protein_match" features that span both genes, I've noticed that the "match_part" features on either gene do have the same "Target" attribute...

All the best,
-Tim

--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-users+unsub...@googlegroups.com.
To post to this group, send email to evidencemo...@googlegroups.com.

Brian Haas

unread,
Oct 11, 2017, 11:25:31 AM10/11/17
to falinor181, EVidenceModeler-users
Hi Tim,

The goal for EVM is mainly to leverage several ab initio predictions, and to then layer on top of that the protein, transcript, and other evidence to help guide it to the best solution.  It looks like you're giving it just the transdecoder data as the ab initio (when it should really be multiple ab initio predictors, like genemark.hmm, augustus, glimmerHMM, snap, etc.).    The OTHER_PREDICTION category is really reserved for the small subset of predictions where you have a high confidence in these, and that they provide evidence for introns and exons but are not predictive of the noncoding regions (essentially add no information for regions where they do not show up on the genome).

hope this helps,

~b


To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-users+unsubscri...@googlegroups.com.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-users+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Tim Fallon

unread,
Oct 18, 2017, 8:43:06 PM10/18/17
to Brian Haas, EVidenceModeler-users
Hi Brian,

I kept troubleshooting, and EVM seemed to do more or less exactly what I wanted in the end!  The problem that I was originally troubleshooting ended up being the transdecoder gene models had multiple tracks for a single gene (from Trinity alternative splice variants), and either the wrong one was being selected, or EVM just seemed to get confused and wouldn’t predict any gene. I settled on these weights in the end:
PROTEIN protein2genome 4
TRANSCRIPT blat-Ppyr_RefGenome_v3_Pasa_v1 5
TRANSCRIPT gmap-Ppyr_RefGenome_v3_Pasa_v1 5
OTHER_PREDICTION        maker   2
ABINITIO_PREDICTION transdecoder 10

The “OTHER_PREDICTION” section has both SNAP and augustus.  Whereas the ABINITIO_PREDICTION has the transdecoder gene models (via the cdna_alignment_orf_to_genome_orf.pl utility script).  Based on my reading of the EVM paper, it seemed like the ABINITIO_PREDICTION and OTHER_PREDICTION sections would be used pretty much the same way?  It seemed that ABINITIO_PREDICTION class had an additional scoring function for if something was “intergenic”, but it was unclear if that was from another feature in the GFF (neither augustus or SNAP have such a feature), or the absence of a feature in the GFF would get you scored as intergenic.

All the best,
-Tim

Brian Haas

unread,
Oct 18, 2017, 9:50:38 PM10/18/17
to Tim Fallon, EVidenceModeler-users
Hi Tim,

For ABINITIO_PREDICTION, the intergenic regions are inferred as the regions between predictions of that prediction type.

Glad to hear it's doing what you wanted.  In general, we try to keep the ABINITIO_PREDICTION tier set to all ab initios and use the OTHER_PREDICTION type sparingly for the subset of models that provide quality gene predictions but don't provide information about the regions where they don't reside.

best,

~b

To post to this group, send email to evidencemodeler-users@googlegroups.com.



--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

--
You received this message because you are subscribed to the Google Groups "EVidenceModeler-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidencemodeler-users+unsub...@googlegroups.com.
To post to this group, send email to evidencemodeler-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages