estimate memory usage ?

32 views
Skip to first unread message

waove...@gmail.com

unread,
Jul 16, 2018, 9:08:54 AM7/16/18
to BinSanity
Hello,

I'm working on a fairly large metagenomic dataset where I tried using the binsanity workflow with >2500 bp contigs (~130,000 contigs) and again with >4000 bp (~ 50,000). Both times I've gotten an "MemoryError" from the affinity_propagation function. I saw on other forum posts that your lab likes to keep the number of contigs to less than 100,000. How much memory do you think that would take?

I've hit that error on the 4 kb min dataset with 50 and 80 Gb.

In the meantime, I'm running the lc method in parallel on the 2.5 kb dataset asking for 100 clusters.

Thanks for your time and help!


Best,
Will

edgr...@usc.edu

unread,
Jul 31, 2018, 6:53:04 PM7/31/18
to BinSanity
Hi Will,

I am running some new benchmarks to see the memory usage on larger datasets. To some extent though the memory will depend on your samples. If you have a high diversity sample it may take the full number of interations to merge on a solution. This can exponentially increase the memory. You can try reducing the Max iterations and Convergence iterations and that will help a little. 

If your working on a shared server it may also be that other users are also requesting alot of memory and its canceling out you run. 

The memory is an ongoing issue that I'm working on trying to get around. For us going to about 100,000 contigs from the surface ocean ends in the memory being ~250-300 gb to run. 

Feel free to let me know if you want any help playing with the best preferences for your sample type. Particularly with Binsanity-lc I have played around a lot with what works best while maintaining as much accuracy as possible.

-Elaina

edgr...@usc.edu

unread,
Aug 2, 2018, 9:17:05 AM8/2/18
to BinSanity
Just as a quick follow up, I just finished running a diverse dataset with 110,000 contigs and 22 samples and it used around 401 GB of memory.

Elaina

unread,
Feb 8, 2019, 2:33:43 PM2/8/19
to BinSanity
Hi Will, 

I apologize for not getting back to you sooner, but this is all great information. I'm in the process of updating the program right now with some better error catching mechanisms so with your permission I'd like to add some of this info regarding what you found with memory to the FAQs. It'd be helpful to know a little bit about the machine your running on for context though!

I would also be interested in chatting with you about the optimal preferences you've found. You should email me directly, I would be interested in hearing your findings as I was planning to write a short blog post soon regarding some tests I've been doing recently as well along the same tune.

-Elaina
Reply all
Reply to author
Forward
0 new messages