Re: [maker-devel] Question about MAKER

260 views
Skip to first unread message

Carson Hinton Holt

unread,
Mar 31, 2021, 2:04:06 PM3/31/21
to Kyungyong Seong, Maker Mailing List
Hello,

The google group is an archive only and cannot be posted to. You can however send questions to the email list (CCed).

The behavior you see is normal. Exonerate gets rerun because it is a relatively fast step that can produce a lot of output and saving the results for later use can be very IO intensive when running under MPI. Some users were actually killing the NFS servers at their institutions. Since the performance gain from archiving exonerate results is small, but the consequences for IO were large, we don’t archive those results.

—Carson

Sent from my iPhone

> On Mar 31, 2021, at 11:51 AM, Kyungyong Seong <s.kyu...@berkeley.edu> wrote:
>
> 
> Dear Carson,
>
> I wanted to leave a question on the google groups, but I didn't have permission to do so. I am having a small problem with MAKER and wanted to get some advice from you.
>
> I ran the initial run with the following figuration successfully.
>
> #-----Genome (these are always required)
> genome=/global/scratch/skyungyong/S.habrochaites/1.Final_Assembly/5.Final_data/SH1353.primary.scaffolds.noPlasmid.fa #genome sequence (fasta file or fasta embeded in GFF3 file)
> organism_type=eukaryotic #eukaryotic or prokaryotic. Default is eukaryotic
>
> #-----Re-annotation Using MAKER Derived GFF3
> maker_gff= #MAKER derived GFF3 file
> est_pass=0 #use ESTs in maker_gff: 1 = yes, 0 = no
> altest_pass=0 #use alternate organism ESTs in maker_gff: 1 = yes, 0 = no
> protein_pass=0 #use protein alignments in maker_gff: 1 = yes, 0 = no
> rm_pass=0 #use repeats in maker_gff: 1 = yes, 0 = no
> model_pass=0 #use gene models in maker_gff: 1 = yes, 0 = no
> pred_pass=0 #use ab-initio predictions in maker_gff: 1 = yes, 0 = no
> other_pass=0 #passthrough anyything else in maker_gff: 1 = yes, 0 = no
>
> #-----EST Evidence (for best results provide a file for at least one)
> est=/global/scratch/skyungyong/S.habrochaites/2.Annotation/2.Maker/DB/Transcriptome/SH_ALL.superscripts.fa #set of ESTs or assembled mRNA-seq in fasta format
> altest=/global/scratch/skyungyong/S.habrochaites/2.Annotation/2.Maker/DB/ITAG4.1_cDNA.fasta #EST/cDNA sequence file in fasta format from an alternate organism
> est_gff= #aligned ESTs or mRNA-seq from an external GFF3 file
> altest_gff= #aligned ESTs from a closly relate species in GFF3 format
>
> #-----Protein Homology Evidence (for best results provide a file for at least one)
> protein=/global/scratch/skyungyong/S.habrochaites/2.Annotation/2.Maker/DB/Prot.evidence.fa #protein sequence file in fasta format (i.e. from mutiple organisms)
> protein_gff= #aligned protein homology evidence from an external GFF3 file
>
> #-----Repeat Masking (leave values blank to skip repeat masking)
> model_org=Solanum #select a model organism for RepBase masking in RepeatMasker
> rmlib=/global/scratch/skyungyong/S.habrochaites/2.Annotation/1.RepeatModeler/SH1353-families.fa #provide an organism specific repeat library in fasta format for RepeatMasker
> repeat_protein=/global/scratch/skyungyong/Software/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins for RepeatRunner
> rm_gff= #pre-identified repeat elements from an external GFF3 file
> prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
> softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)
>
> #-----Gene Prediction
> snaphmm=0 #SNAP HMM file
> gmhmm= #GeneMark HMM file
> augustus_species=0 #Augustus gene prediction species model
> fgenesh_par_file= #FGENESH parameter file
> pred_gff= #ab-initio predictions from an external GFF3 file
> model_gff= #annotated gene models from an external GFF3 file (annotation pass-through)
> run_evm=0 #run EvidenceModeler, 1 = yes, 0 = no
> est2genome=1 #infer gene predictions directly from ESTs, 1 = yes, 0 = no
> protein2genome=1 #infer predictions from protein homology, 1 = yes, 0 = no
> trna=0 #find tRNAs with tRNAscan, 1 = yes, 0 = no
> snoscan_rrna= #rRNA file to have Snoscan find snoRNAs
> snoscan_meth= #-O-methylation site fileto have Snoscan find snoRNAs
> unmask=0 #also run ab-initio prediction programs on unmasked sequence, 1 = yes, 0 = no
> allow_overlap= #allowed gene overlap fraction (value from 0 to 1, blank for default)
>
> #-----Other Annotation Feature Types (features MAKER doesn't recognize)
> other_gff= #extra features to pass-through to final MAKER generated GFF3 file
>
> #-----External Application Behavior Options
> alt_peptide=C #amino acid used to replace non-standard amino acids in BLAST databases
> cpus=1 #max number of cpus to use in BLAST and RepeatMasker (not for MPI, leave 1 when using MPI)
>
> #-----MAKER Behavior Options
> max_dna_len=100000 #length for dividing up contigs into chunks (increases/decreases memory usage)
> min_contig=1 #skip genome contigs below this length (under 10kb are often useless)
>
> pred_flank=200 #flank for extending evidence clusters sent to gene predictors
> pred_stats=0 #report AED and QI statistics for all predictions as well as models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=0 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
> always_complete=0 #extra steps to force start and stop codons, 1 = yes, 0 = no
> map_forward=0 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
> keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
> min_intron=20 #minimum intron length (used for alignment polishing)
> single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes
>
> tries=3 #number of times to try a contig if there is a failure for some reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
> TMP=
>
> For the second run, I trained SNAP and Augustus and provided the path to the HMMs in maker_opts.ctl. With est2genome=0 and prot2genome=0, I ran MAKER in the same directory, hoping MAKER to reuse the info from the previous run.
>
> The run starts with warning
> MAKER WARNING: Changes in control files make re-use of hint based predictions impossible
> Old hint based prediction files will be erased before continuing
>
> And it seems to run exonerate anew:
> running est2genome search.
> #--------- command -------------#
> Widget::exonerate::est2genome:
> /global/scratch/skyungyong/Software/maker/exe/exonerate/bin/exonerate -q /tmp/maker_gdlFj8/53/LA2119%2ETRINITY_DN24321_c0_g1.for.475-1651.53.fasta -t /tmp/maker_gdlFj8/53/scaffold53.475-1651.53.fasta -Q dna -T dna --model est2genome --minintron 20 --maxintron 10000 --showcigar --percent 20 > /tmp/maker_gdlFj8/53/scaffold53.475-1651.LA2119%2ETRINITY_DN24321_c0_g1.e.exonerate
>
> Is this normal?
> Thank you for your help!
> Kyungyong
>
>
>
>
>
>
_______________________________________________
maker-devel mailing list
maker...@yandell-lab.org
http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org
Reply all
Reply to author
Forward
0 new messages