Memory usage concerns

68 views
Skip to first unread message

nikhila...@gmail.com

unread,
Mar 6, 2019, 12:18:01 PM3/6/19
to BinSanity
Hi Elaina, 

I am working on binning a metagenome with about 110k contigs (all contigs in it are >2500bp). I am working on a server with ~96GB of RAM and I have ran into memory issues. I realized that I either require more memory, less contigs (i.e. increase minimum contig size), or should partition my contigs into separate chunks - the first option is not feasible for me right now. The less contigs option is a possibility, and so is the separate partitioning of contigs. Am I correct to assume that reducing the number of contigs in the input by increasing the size threshold is the best option? In the case of separating my ~110k contigs into different parts, I assume I won't be able to aggregate the binning results afterwards? FYI I am also going to be using a few other binning algorithms and DAS Tool to generate a final set of bins. I would appreciate your input on the subject, and thank you for your time.  

Best, 

Nikhil 

Elaina Graham

unread,
Mar 6, 2019, 12:40:14 PM3/6/19
to nikhila...@gmail.com, BinSanity
Hello Nikhil,

I would actually recommend using Binsanity-lc first as this will allow you to bypass getting rid of contigs. This is similar to Binsanity-wf but with some added steps and is currently available in the 'test-scripts' folder of the github. I have spent the past few months verifying its usage so we can roll this into a new series of scripts for Binsanity2. The one step I am still working on optimizing is the `-C` parameter which allows an initial subsetting step using a less memory intensive clustering algorithm. With that many contigs I would recommend setting this `-C` parameter to 25. My tests show this should significantly reduce the memory consumption. 

If you run into another memory error, please let me know and I'll work with you to optimize settings so your able to run everything without sacrificing contigs.

-Elaina

--
You received this message because you are subscribed to the Google Groups "BinSanity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to binsanity+...@googlegroups.com.
To post to this group, send email to bins...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/binsanity/ec7068db-a6c3-4482-ab22-fd75ff22a3fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nikhila...@gmail.com

unread,
May 13, 2019, 12:22:02 PM5/13/19
to BinSanity

Hi Elaina, 

Thank you for the input - the Binsanity-lc with -C set to 25 worked well for our data. I ran another metagenome with the -C parameter set to 100 (I read this on the Binsanity github page) and there was no memory issue. I know you mentioned to use -25 to reduce memory consumption, and I was wondering if there is some sort of tradeoff with this parameter i.e. since I am able to use -C at 100, should I use that parameter for my analyses? I am not very familiar with k-means clustering, and how it may affect the output obtained from Binsanity-lc. Any advice would be greatly appreciated! 

Thanks again, 
Nikhil 

Elaina Graham

unread,
May 18, 2019, 12:01:30 PM5/18/19
to nikhila...@gmail.com, BinSanity
Hi Nikhil,

So in terms of K-means vs affinity propagation, the main reason our program using affinity propagation over something like K-Means is because of how it intrinsically works. Whereas K-Means can yield different initial clusters depending  on the number of initializations, affinity propagation always yields the same results. The suggestion for `-C 25` was initially based on your number of contigs. A general rule of thumb I have used in the past is if you divide the number of contigs you have by 10000 that should be about the number you set to -C. I am currently working on `Binsanity2` which will incorporate Binsanity-LC into the core code and also will take out some of the need to manually set this parameter. 

Having said that ultimately when I test on the example dataset whether I do 25 or 100 the results are nearly identical. And in practice I have erred toward the larger number because it improves both memory and speed of which the program runs with very few losses in accuracy.

-Elaina

--
You received this message because you are subscribed to the Google Groups "BinSanity" group.
To unsubscribe from this group and stop receiving emails from it, send an email to binsanity+...@googlegroups.com.
To post to this group, send email to bins...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages