[maker-devel] Genemark-es evidence

537 views
Skip to first unread message

Rob Syme

unread,
Aug 23, 2011, 8:15:56 PM8/23/11
to maker...@yandell-lab.org
Can Maker use evidence from the self-training genemark-es?

maker_opts.ctl supplies the opportunity to specify the gmhmm, but I
don't think that genemark-es supplies a pre-built hmm (being a
self-training algo).

I've supplied the location of gmhmme3 in maker_exe.ctl, and set
unmask=1 in maker_opts.ctl, but the gene models generated look to come
soley from snap (snap hmm was supplied).

Is it possible to add evidence from genemark-es to the generated gene
models? Is the best option to simply run genemark-es independantly,
and then supply the gff as pred_gff in maker_opts.ctl?

-r

Rob Syme
PhD Student
Curtin University
Western Australia

_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
Aug 23, 2011, 9:14:31 PM8/23/11
to Rob Syme, maker...@yandell-lab.org
Genemark-ES (gm_es.pl) is really just Genemark.hmm (gmhmme3) under the
hood. The gm_es.pl script runs gmhmme3, filters the models based on
domain content, retrains, and then runs gmhmme3 again. This happens a
total of seven times.

At the end, there will be a mod directory created after running
Genemark-ES. Take the es.mod file in that directory. That is your final
HMM. Provide that to gmhmm in MAKER.

Depending on what organism you are annotating, you may want to consider
other predictors as well. Genemark works well on fungi and Oomycetes
(organisms with short introns and long exons). It performs poorly
relative to other predictors as introns become longer and exons become
shorter (like on mammals or many species of insects). For those genomes
Augustus tends to produce better results, it is harder to train though.

Thanks,
Carson

Olaf Mueller

unread,
Aug 31, 2011, 11:45:12 AM8/31/11
to maker...@yandell-lab.org
Hi,

Fungal transcriptome assemblies often yield a number of erroneously
joined transcripts, due to high gene densities and overlapping UTRs in
fungi. Passing these fusion transcripts to MAKER as EST evidence
sometimes leads to prediction of merged genes, which are actually two.
Is there a way to compensate for this problem in MAKER? I understand
that newly trained predictors like SNAP rely on est2genome. But could a
correction possibly inferred from blast evidence, where two adjacent
similarity regions in one gene model actually originate from two
separate proteins?

Thanks
Olaf

Carson Holt

unread,
Aug 31, 2011, 1:45:53 PM8/31/11
to Olaf Mueller, maker...@yandell-lab.org
If the ESTs are erroneous, there is no guarantee. Bad data will always
decrease the annotation quality. You may end up with very long UTRs or
merged ORFs in all cases. You could try not providing ESTs and just
calling based on protein homology? Or you could try more stringent EST
assembly options.

--Carson

Reply all
Reply to author
Forward
0 new messages