bamCoverage on large BAM files

596 views
Skip to first unread message

Carlos Guzman

unread,
Jul 7, 2016, 10:18:56 AM7/7/16
to deepTools
I have recently been trying to generate bigWig tracks from some STAR aligned RNA-seq data where the BAM files range from 25gb to 45gb. However, when attempting to run bamCoverage for the 25gb bam file, i'm running out of memory which causes the program to stop, but the command line does not end. I'm using a 128gb RAM desktop.

Is this simply a problem where I just need more RAM? Or is there something I can do to make this memory efficient?

Devon Ryan

unread,
Jul 7, 2016, 1:30:56 PM7/7/16
to Carlos Guzman, deepTools
Hi Carlos,

Do you happen to be writing the temporary files to /dev/shm? My guess is that you're doing that and then using a very low bin size (e.g., 1). That combined with the other running processes could cause memory issues. In general there's no size limitation to bamCoverage, we routinely use it on files around that size.

Devon


On 07/07/2016 04:18 PM, Carlos Guzman wrote:
I have recently been trying to generate bigWig tracks from some STAR aligned RNA-seq data where the BAM files range from 25gb to 45gb. However, when attempting to run bamCoverage for the 25gb bam file, i'm running out of memory which causes the program to stop, but the command line does not end. I'm using a 128gb RAM desktop.

Is this simply a problem where I just need more RAM? Or is there something I can do to make this memory efficient?
--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Devon Ryan, PhD
Bioinformatician / Data manager
Bioinformatics Core Facility
Max Planck Institute for Immunobiology and Epigenetics
Email: dpry...@gmail.com

Carlos Guzman

unread,
Jul 7, 2016, 1:46:56 PM7/7/16
to deepTools
How do I change where temporary files are being stored?

I'm actually using a bin size of 50 and what I mean is that I run bamCoverage and it will start eating up the 128gb of ram and then start eating up the 30gb of temp RAM as well ... At that point the program hangs indefinetely, RAM and cpu usage stops and lowers completely.

Devon Ryan

unread,
Jul 7, 2016, 5:15:13 PM7/7/16
to deep...@googlegroups.com
That seems odd, how many threads are you using and which version of
deepTools?

Devon

On 07/07/2016 07:46 PM, Carlos Guzman wrote:
> How do I change where temporary files are being stored?
>
> I'm actually using a bin size of 50 and what I mean is that I run bamCoverage and it will start eating up the 128gb of ram and then start eating up the 30gb of temp RAM as well ... At that point the program hangs indefinetely, RAM and cpu usage stops and lowers completely.
>

Carlos Guzman

unread,
Jul 7, 2016, 6:45:38 PM7/7/16
to deepTools
Using 32 cores and the latest version of deepTools found on the anaconda bioconda channel.

Devon Ryan

unread,
Jul 8, 2016, 6:11:28 AM7/8/16
to Carlos Guzman, deepTools
That seems odd then. What happens if you use fewer cores? This should
result in less memory. How many contigs/chromosomes are in the genome?
I wonder if the genome chunk size is becoming absurdly huge somehow.

Devon
--
Devon Ryan, Ph.D.
Email: dpr...@dpryan.com
Data Manager/Bioinformatician
Max Planck Institute of Immunobiology and Epigenetics
Stübeweg 51
79108 Freiburg
Germany


On Fri, Jul 8, 2016 at 12:45 AM, Carlos Guzman <cguzma...@gmail.com> wrote:
> Using 32 cores and the latest version of deepTools found on the anaconda bioconda channel.
>

Fidel Ramirez

unread,
Jul 8, 2016, 6:25:29 AM7/8/16
to Devon Ryan, Carlos Guzman, deepTools
Hi Carlos,

As Devon says, the memory usage of deepTools should be low unless you save intermediary results into /dev/shm but the files are not that big, they are proportional to genome size and bin size (smaller bins larger size).

Other issue could be that deepTools is trying to load into memory reads from a region that contains an extremely large number of reads. Could it be that some genes are covered with millions of reads in your data? Is this RNA-seq data filtered for ribosomal RNA?  


-fidel

--

Fidel Ramirez

Carlos Guzman

unread,
Jul 8, 2016, 10:21:38 PM7/8/16
to deepTools, dpr...@dpryan.com, cguzma...@gmail.com
I tried using 1 thread, and still ran into the same problem unfortunately.

It could be that some genes are covered with millions of reads. I'm guessing I would have to re-run the mapping step to remove all ribosomal RNA from the gtf file used in STAR?

Devon Ryan

unread,
Jul 9, 2016, 2:32:44 AM7/9/16
to Carlos Guzman, deepTools
If rRNA is annotated for you organism then put those regions into a bed file and pass that to the --blackListFileName parameter. We've discussed internally a more general fix for this in the code that should be possible. I'll look into that next week. 

Devon

Sent from my iPhone
Reply all
Reply to author
Forward
0 new messages