Dear Carson:
(1) Thank you for your explanation. I will try to set max_dna_len as 400kb for our rodent species, which is a little bit higher than the suggested value for large vertebrate genome (in the maker manual it mentioned "300,000 is a good max_dna_len on large vertebrate genomes if memory is not a limiting factor").
(2) By reading some of your replies in the maker google group, and I noticed that it can reduce memory and save time for annotation if I set depth_blast to a certain number. So I changed the following parameters. But I wonder, whether it will decrease the quality of annotation? If it won't affect the quality, can I even use a smaller number (e.g., 20) to save more memory and time?
depth_blastn=30 #Blastn depth cutoff (0 to disable cutoff)
depth_blastx=30 #Blastx depth cutoff (0 to disable cutoff)
depth_tblastx=30 #tBlastx depth cutoff (0 to disable cutoff)
bit_rm_blastx=30 #Blastx bit cutoff for transposable element masking
(3) I also have some concerns about the speed, especially for the long scaffolds (around 100Mb). I wonder which part is the most time consuming for genome annotation (repeat masking, blast, or polishing?). Particularly, I wonder whether the blastx of protein evidence will take majority of time. Now, I have prepared 99k mammalian Swiss protein sequences and 340k rodent TrEMBL protein sequences as protein evidences. I am considering whether I can save much time if I only use the 99k mammalian Swiss protein sequences as evidences.
(4) For some reasons, I can not run maker though MPI on our cluster. So I
can only start multiple maker. I wonder if it is possible to let
multiple maker to annotate the same long scaffold (i.e., for a single
sequence I start multiple maker, without splitting the long sequence
into shorter ones).
**************** the bash file used to submit the maker job
#!/bin/bash
#$ -cwd
#$ -S /bin/bash
#$ -j y
#$ -N makerT2
#$ -l h_vmem=8g
#$ -pe smp 2
module load MAKER/2.31.9/perl.5.22.1
maker --q 2> maker_test.error
Many thanks
Best
Qaunwei