[maker-devel] High memory consumption

78 views
Skip to first unread message

Kyungyong Seong

unread,
Dec 17, 2021, 12:16:12 PM12/17/21
to Maker Mailing List
Hi!

MAKER has been running fine on my genome (~1Gb; 800 contigs) but is now stuck with ~30 contigs that keep failing because of high memory consumption. I am using mpi, running 20-30 contigs for annotation in parallel, depending on the machine. I started with 64Gb memory machines but have moved up to 1.5 Tb machines as the job kept failing. Unfortunately, all memory of this machine is also saturated. It looks like tblastx is taking lots of time and resources. The databases I have are about 200 Mb for the proteins and 570 Mb for cDNAs. max_dna_len is set as 100000 in maker_opt.ctl. Would there be a way to improve this? Decreasing the number of jobs for MPI slowed down memory saturation but eventually the same happened. 

Thank you!
Kyungyong


Kyungyong Seong

unread,
Dec 22, 2021, 6:13:29 PM12/22/21
to Carson Holt, Maker Mailing List
Thank you for the tips! How about reducing the time for tblastx? My cluster has a 3 days run limit. I think what is happening is that MAKER is terminated because of out-of-memory issues or runtime cap, and when MAKER is restarted, tblastx needs to start from scratch. Do you think it would be better not to use MPI and set cpus=30? Or would it be okay to set up mpi = 3 and cpus=10 if I have 30 cores?


On Fri, Dec 17, 2021 at 9:29 AM Carson Holt <cars...@gmail.com> wrote:
1. Make sure your system is not configured with an in memory /tmp directory. If it is, every file written to temporary storage will use RAM.
2. If running under MPI, cpu= in maker_opts.ctl must be set to 1.
3. max_dna_len= should be 100000 (the default)
4. In maker_bopts.ctl, set all the depth_blast= options to something like 10 or 20 (there are 3 depth values you will have to set). The default is to keep everything, and if you have really deep alignments that can use a lot of RAM with out any actual benefit for gene prediction.

—Carson
> _______________________________________________
> maker-devel mailing list
> maker...@yandell-lab.org
> http://yandell-lab.org/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
Jan 3, 2022, 1:24:21 PM1/3/22
to Kyungyong Seong, Maker Mailing List
Really the only reason to use the altest options is if you don’t have protein data, but for some reason have transcript data you want to use from a different species. If you have protein data like a previous annotation, use that instead because TBLASTX takes at least 6 times longer than BLASTP and is less sensitive. Other than that, setting depth_tblastx= in the maker_opts.ctl file.

The tblastx.temp_dir holds partial results that get merged to a tblastx file.  On failure or restart, if a tblastx.temp_dir exists, then it gets erased and rerun. If a tblastx file exists, it gets used instead of rerunning.

—Carson


On Dec 22, 2021, at 8:42 PM, Kyungyong Seong <s.kyu...@berkeley.edu> wrote:

Hi Carson,

Looking at the progress more carefully, I learned that some query and database combinations cause tblastx to run forever. Typically, the tblastx search ends in reasonable times (a few hours maximum), but for those, it takes days ( and still running ) to search the 100 kb query against a 50 Mb database. And all CPUs are trapped by these searches, making MAKER to never finish. 

Would it be possible to skip tblastx search for these queries + databases? I have intermediate files from a previous MAKER run produced with a smaller size of databases, so I attempted to copy some of these files into the current run folders. For instance, for atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir that causes the issue,

I first copied atg000169l.12.Solanacea%2Ecds%2Efa.tblastx from the previous run into the proper directory and deleted  atg000169l.12.Solanacea%2Ecds%2Efa.tblastx.temp_dir. 

Then I modified run.log.child.12 to include FINISHED SH1353.alternative.noPlasmid.maker.output/SH1353.alternative.noPlasmid_datastore/42/CC/atg000169l//theVoid.atg000169l/1/atg000169l.12.Solanacea%2Ecds%2Efa.tblastx

However, it seems like MAKER still starts over from tblastx. I have a small number of contigs left, so manually working around this is feasible. Would there be a way to do this?

Thank you for your help!
Kyungyong


Carson Holt

unread,
Jan 3, 2022, 2:53:56 PM1/3/22
to Kyungyong Seong, Maker Mailing List
You can try mpi = 3 and cpus=10. It might recuce memory usage.

—Carson

Kyungyong Seong

unread,
Jan 21, 2022, 5:25:29 PM1/21/22
to Carson Holt, Maker Mailing List
I found out that high memory consumption was caused by BLAST or exonerate on incompletely masked repetitive contigs vs. a few annotated repetitive elements in the public annotation data. There were numerous possible matches, and BLAST or exonerate was trying to find all of those, increasing the running time and memory usage. I was eventually able to get away from this. Thanks for your suggestions!




Reply all
Reply to author
Forward
0 new messages