Dear Maker developers,
1. We use assembled RNAseq(from same species) and protein evidence(from evolutionary close species) to generate training gene structure(1st iteration, est2genome=1,protein2genome=1 ).
2. This is than used to train abinito gene predictors, SNAP and AUGUSTUS.
3. GeneMarkES( version: GeneMark-ES / ET v.4.32) is used to produce training data-set with the command
gmes_petap.pl --sequence pmin_jelly.fa
4. We would be predicting genes using results from SNAP, Genemark and AUGUSTUS(2nd iteration, est2genome=0, protein2genome=0)
I have couple of questions relating to Genemark and AUGUSTUS
1. AUGUSTUS
We do not have a species file for species file of our interest or evolutionary closer species
following command is used to generate species file
/autoAug.pl --genome=pmin_jelly.fa --species=pminiata --cdna=pmin_transcripts.fa --trainingset=genome.gff3 --singleCPU -v --useexistingAUGUSTUS is taking too long to compute species file, is there a solution for this issue. Using species file from other organism might generate false positives. Is it advised in such situations to not used AUGUSTUS model?
2. Genemark
I used the gmhmm file generated in the genemark output directory, however I encounter following error
FYI-------------------------
STATUS: Parsing control files...---------------------
ERROR: You have failed to provide a value for 'gmhmme3' in the control files.
ERROR: You have failed to provide a value for 'probuild' in the control files.
-----
maker_opts.ctl
#-----Gene Prediction
snaphmm=/home/parul/Pmin_new/maker_snap/pmin1.hmm #SNAP HMM file
gmhmm=/home/parul/Pmin_new/maker_snap/gmhmm.mod #GeneMark HMM file
-----
Using SNAP for training gene model yields over 6000-7000 additional gene. The model has good cumulative AED value.
I was hoping in addition to SNAP, if I could use AUGUSTUS and GeneMark to train the gene model to fuse dispersed models so that the gene count is within the expected range.
Thanks and regards,
Parul
_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
On Nov 29, 2016, at 10:13 AM, Kudtarkar, Parul V. <par...@caltech.edu> wrote:
Dear Carson and Daniel,
Thanks for getting back to me promptly.
Adding the path to genemark executable in maker_exe.ctl fixes the error.
Hopefully optimize_augustus.pl runs quicker compared to autoAug.pl (which has been running for almost a week now)
It would be interesting and we look forward to evaluate which model optimizes our expected gene count, AED values and has recognizable domains.
PS. We think BUSCO has helped us to evaluate gene model completeness.
Thanks,
Parul
Parul Kudtarkar
Bioinformatician
Biology and Biological Engineering
Office: 278 Beckman Institute
California Institute of Technology