[maker-devel] Maker not producing expected output

215 views
Skip to first unread message

Kevin Kocot

unread,
Oct 17, 2015, 12:10:51 AM10/17/15
to maker...@yandell-lab.org
Hello,

I've run Maker on a draft invertebrate genome and it seemed to finish
successfully. However, many of the expected output files were not
produced. If I go to, for example, XX_datastore/00/0C/scaffold-334630/,
all I see is:

theVoid.scaffold-334630
run.log
scaffold-334630.gff

In particular, I'm looking for the transcripts and proteins fasta files.
I'm sure I have a configuration setting incorrect or one of the
dependencies not correctly installed, but I can't figure out what the
problem is. Any thoughts on how I can resolve this issue and generate
these files? Ideally I would love to be able to generate these files
without having to run the whole pipeline again. Details on my
configuration settings and the contents of the run.log file from my
example above are pasted below.

Thank you,
Kevin

-----
run.log from the example folder above looks like this:
-----
SHARED_ID d574e9ca9b0019a9fe147ccb9db3588b
CTL_OPTIONS maker_gff
CTL_OPTIONS other_gff
CTL_OPTIONS est test-transcriptome.fa
CTL_OPTIONS est_reads
CTL_OPTIONS altest KK273.fa
CTL_OPTIONS est_gff
CTL_OPTIONS altest_gff
CTL_OPTIONS protein test-AA.fa
CTL_OPTIONS protein_gff
CTL_OPTIONS model_org all
CTL_OPTIONS repeat_protein te_proteins.fasta
CTL_OPTIONS rmlib
CTL_OPTIONS rm_gff
CTL_OPTIONS organism_type eukaryotic
CTL_OPTIONS predictor est2genome,genemark,protein2genome
CTL_OPTIONS est2genome 1
CTL_OPTIONS altest2genome 0
CTL_OPTIONS snaphmm
CTL_OPTIONS gmhmm output/gmhmm.mod
CTL_OPTIONS augustus_species
CTL_OPTIONS fgenesh_par_file
CTL_OPTIONS model_gff
CTL_OPTIONS pred_gff
CTL_OPTIONS max_dna_len 100000
CTL_OPTIONS split_hit 10000
CTL_OPTIONS pred_flank 200
CTL_OPTIONS pred_stats 0
CTL_OPTIONS min_protein 0
CTL_OPTIONS AED_threshold 1
CTL_OPTIONS single_exon 0
CTL_OPTIONS single_length 250
CTL_OPTIONS keep_preds 0
CTL_OPTIONS map_forward 0
CTL_OPTIONS est_forward 0
CTL_OPTIONS correct_est_fusion 0
CTL_OPTIONS alt_splice 0
CTL_OPTIONS always_complete 0
CTL_OPTIONS alt_peptide C
CTL_OPTIONS evaluate 0
CTL_OPTIONS blast_type ncbi+
CTL_OPTIONS softmask 1
CTL_OPTIONS pcov_blastn 0.8
CTL_OPTIONS pid_blastn 0.85
CTL_OPTIONS eval_blastn 1e-10
CTL_OPTIONS bit_blastn 40
CTL_OPTIONS depth_blastn 0
CTL_OPTIONS pcov_rm_blastx 0.5
CTL_OPTIONS pid_rm_blastx 0.4
CTL_OPTIONS eval_rm_blastx 1e-06
CTL_OPTIONS bit_rm_blastx 30
CTL_OPTIONS pcov_blastx 0.5
CTL_OPTIONS pid_blastx 0.4
CTL_OPTIONS depth_blastx 0
CTL_OPTIONS eval_blastx 1e-06
CTL_OPTIONS bit_blastx 30
CTL_OPTIONS pcov_tblastx 0.8
CTL_OPTIONS pid_tblastx 0.85
CTL_OPTIONS eval_tblastx 1e-10
CTL_OPTIONS bit_tblastx 40
CTL_OPTIONS depth_tblastx 0
CTL_OPTIONS ep_score_limit 20
CTL_OPTIONS en_score_limit 20
CTL_OPTIONS enable_fathom 0
CTL_OPTIONS unmask 0
CTL_OPTIONS model_pass 0
CTL_OPTIONS est_pass 0
CTL_OPTIONS altest_pass 0
CTL_OPTIONS protein_pass 0
CTL_OPTIONS rm_pass 0
CTL_OPTIONS other_pass 0
CTL_OPTIONS pred_pass 0
CTL_OPTIONS run genemark
LOGCHILD
/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0

LOGCHILD
/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0

LOGCHILD
/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0

STARTED
test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark

FINISHED
test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.abinit_nomask.0.gmhmm%2Emod.genemark

STARTED
test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section

FINISHED
test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.pred.raw.section

LOGCHILD
/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0

LOGCHILD
/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0

LOGCHILD
/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0

STARTED
test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section

FINISHED
test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/scaffold-334630.0.final.section

LOGCHILD
/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0

LOGCHILD
/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0

LOGCHILD
/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.maker.output/test_scaffolds_annotated_as_metazoan_by_MG-RAST_datastore/00/0C/scaffold-334630//theVoid.scaffold-334630/run.log.child.0


-----
maker_opts
-----
#-----Genome (these are always required)
genome=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test_scaffolds_annotated_as_metazoan_by_MG-RAST.fas
#genome sequence (fasta file or fasta embeded in GFF3 file)
organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic

#-----Re-annotation Using MAKER Derived GFF3
maker_gff= #MAKER derived GFF3 file
est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no
rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no

#-----EST Evidence (for best results provide a file for at least one)
est=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-transcriptome.fa
#set of ESTs or assembled mRNA-seq in fasta format
altest=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/KK273.fa
#EST/cDNA sequence file in fasta format from an alternate organism
est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
altest_gff= #aligned ESTs from a closly relate species in GFF3 format

#-----Protein Homology Evidence (for best results provide a file for at
least one)
protein=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/test-AA.fa
#protein sequence file in fasta format (i.e. from mutiple oransisms)
protein_gff= #aligned protein homology evidence from an external GFF3 file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for
RepeatMasker
repeat_protein=/usr/local/bin/maker/data/te_proteins.fasta #provide a
fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change
this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg
and dust filtering)

#-----Gene Prediction
snaphmm= #SNAP HMM file
gmhmm=/media/kmkocot/Sclerite/genome_projects/test/Ray_2_assembly/MAKER/output/gmhmm.mod
#GeneMark HMM file
augustus_species= #Augustus gene prediction species model
fgenesh_par_file= #FGENESH parameter file
pred_gff= #ab-initio predictions from an external GFF3 file
model_gff= #annotated gene models from an external GFF3 file (annotation
pass-through)
est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1
= yes, 0 = no

#-----Other Annotation Feature Types (features MAKER doesn't recognize)
other_gff= #extra features to pass-through to final MAKER generated GFF3
file

#-----External Application Behavior Options
alt_peptide=C #amino acid used to replace non-standard amino acids in
BLAST databases
cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for
MPI, leave 1 when using MPI)

#-----MAKER Behavior Options
max_dna_len=100000 #length for dividing up contigs into chunks
(increases/decreases memory usage)
min_contig=1 #skip genome contigs below this length (under 10kb are
often useless)

pred_flank=200 #flank for extending evidence clusters sent to gene
predictors
pred_stats=0 #report AED and QI statistics for all predictions as well
as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=0 #Take extra steps to try and find alternative splicing, 1 =
yes, 0 = no
always_complete=0 #extra steps to force start and stop codons, 1 = yes,
0 = no
map_forward=0 #map names and attributes forward from old GFF3 genes, 1 =
yes, 0 = no
keep_preds=0 #Concordance threshold to add unsupported gene prediction
(bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron
size for evidence alignments)
single_exon=0 #consider single exon EST evidence when generating
annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if
'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=2 #number of times to try a contig if there is a failure for some
reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes,
0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1
= yes, 0 = no
TMP= #specify a directory other than the system default temporary
directory for temporary files

--
Kevin M. Kocot, Ph.D.
NSF International Postdoctoral Research Fellow
Degnan Lab
The University of Queensland
School of Biological Sciences
325 Goddard Building 8
St. Lucia, QLD 4072
Australia
Ph: +61 0402 488 430


_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Daniel Ence

unread,
Oct 17, 2015, 11:46:39 AM10/17/15
to k.k...@uq.edu.au, maker...@yandell-lab.org
Hi Kevin, So I have a couple of clarifying questions, and an explanation that’ll hopefully be helpful.

If you look in the master datastore log, do you see an entry that shows that scaffold finished successfully? It will have the name of the scaffold, then the path to the results directory, and then a status. There should be one that shows that maker started working on it, and one that shows that maker finished it.

Second what are the files that you’re expecting to see? I think you’re expecting to see couple of fasta files and a gff3 file that contain all the annotation results all gathered together. You can gather those results with the fasta_merge, and gff3_merge scripts that came with maker.

To explain what you saw in that example results directory that you sent, if there weren’t any models or predictions on that scaffold, then there won’t be fasta files in the results directory. You could verify that by looking at the scaffold-334630.gff file. The fast_merge, and gff3_merge will gather all of the results fasta and gff files for all the scaffolds and put them into a few fasta files and one gff3 files, respectively.

Let me know whether that helps,
Daniel


Daniel Ence
Graduate Student
Eccles Institute of Human Genetics
University of Utah
15 North 2030 East, Room 2100
Salt Lake City, UT 84112-5330

Carson Holt

unread,
Oct 17, 2015, 4:25:19 PM10/17/15
to k.k...@uq.edu.au, maker...@yandell-lab.org

You will only get fasta files for the contig when there are gene models present on that contig. The only ab initio predictor you provided parameters for is GeneMark, and apparently it did not predict any genes for the contig in question. If it did you would have at least a fasta files in the output that contained all predictions made by GeneMark. MAKER doesn’t make gene models, rather it provides hints to other gene predictors based on the evidence alignments and then promotes and polishes the models they make. If they produce no models, then you will get no results.

You can try adding additional gene predictors like SNAP (incase GeneMark just isn’t performing well), or you can check the length of your contig (contigs shorter than about 10kb rarely produce any results - they are too short to be annotatable). Try looking at the results from one of the larger contigs, or use fasta_merge to gather all results from all contigs.

—Carson

> On Oct 16, 2015, at 10:10 PM, Kevin Kocot <kmk...@gmail.com> wrote:
>

Reply all
Reply to author
Forward
0 new messages