[maker-devel] error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file

720 views
Skip to first unread message

Kudtarkar, Parul V.

unread,
Nov 29, 2016, 12:14:24 PM11/29/16
to maker...@yandell-lab.org

Dear Maker developers,

1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ). 

2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS. 

3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command

gmes_petap.pl --sequence pmin_jelly.fa

4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0) 

I have couple of questions relating to Genemark and AUGUSTUS

1. AUGUSTUS

We do not have a species file for species file of our interest or evolutionary closer species

following command is used to generate species file

/autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexisting 
AUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model?

2. Genemark

I used the gmhmm file generated in the genemark output directory, however I encounter following error

-------------------------

STATUS: Parsing control files...
ERROR: You have failed to provide a value for 'gmhmme3' in the control files.
ERROR: You have failed to provide a value for 'probuild' in the control files.

---------------------
FYI

-----

maker_opts.ctl

#-----Gene Prediction
snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file
gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file

-----

Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value. 

I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range.


Thanks and regards,

Parul


Sent from my iPhone

Daniel Ence

unread,
Nov 29, 2016, 12:33:52 PM11/29/16
to Kudtarkar, Parul V., maker...@yandell-lab.org
HI Parul, Training augustus does take a long time. Much longer than for the other two predictors that you mentioned. Have you tried using the webAugustus web portal? The team that made augustus run it and can probably help you with trouble-shooting their page for creating training sets: http://bioinf.uni-greifswald.de/webaugustus/training/create

The error that you got regarding genemark is saying that maker can’t find the genemark and probuild executable files. These are specified in the maker_exe.ctl file, not the “opts” file. You need to put valid paths to those executable files in for the given parameters. This is something that is usually specified during installation of MAKER. 

Hope that helps, 
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
Nov 29, 2016, 12:34:52 PM11/29/16
to Kudtarkar, Parul V., maker...@yandell-lab.org
How to train Augustus —> http://www.molecularevolution.org/molevolfiles/exercises/augustus/training.html

Step 2 shows how to create an empty species to start training with. Then Step 4 (optimize_augustus.pl) is the step that takes a while.

Then for GeneMark, you must set the location of the necessary GeneMark executables in the maker_exe.ctl file.

After getting all predictors trained, and running a few contigs, take a moment to review the predictor performance by manually reviewing them in something like Apollo. It is not uncommon that one or more perform poorly on an organism (they should each produce similar predictions). If one is significantly off relative to the other predictors and the evidence, it should be dropped. A bad behaving predictor will reduce the overall annotation performance.

—Carson




On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. <par...@caltech.edu> wrote:

Kudtarkar, Parul V.

unread,
Nov 29, 2016, 6:41:01 PM11/29/16
to Carson Holt, maker...@yandell-lab.org, Cameron, Robert A. (Andy)

Dear Carson and Daniel,


Thanks for getting back to me promptly.

Adding the path to genemark executable in maker_exe.ctl fixes the error.

Hopefully optimize_augustus.pl runs quicker compared to autoAug.pl (which has been running for almost a week now)

It would be interesting and we look forward to evaluate which model optimizes our expected gene count, AED values and has recognizable domains.

PS. We think BUSCO  has helped us to evaluate gene model completeness.


Thanks,

Parul


----

Parul Kudtarkar

Bioinformatician

Biology and Biological Engineering

Office: 278 Beckman Institute

California Institute of Technology

MC 139-74
Pasadena CA 91125

http://www.echinobase.org


From: Carson Holt <cars...@gmail.com>
Sent: Tuesday, November 29, 2016 9:34:31 AM
To: Kudtarkar, Parul V.
Cc: maker...@yandell-lab.org
Subject: Re: error: training genemodel with SNAP and GeneMark & run time to generate AUGUTUS species file
 
Reply all
Reply to author
Forward
0 new messages