bam Coverage with -bl simple.bed

80 views
Skip to first unread message

Aevermann, Brian

unread,
Aug 23, 2016, 9:56:00 PM8/23/16
to deep...@googlegroups.com
Hello,

I’ve been using deeptools/bamCoverage extensively in a project , and I appreciate all the work you’ve done creating these tools. Currently, Im attempting to use a bed file (simple 3 column) as a black list when converting from bam to bigwig. However, the RAM started ballooning when I attempted to do this (I killed it at 70GB and climbing). The simple bed file is ~1.8 GB and lists positions in 1bp intervals. Has this tool ever been tested for this use case or were the bed files/ gtfs assumed to be larger intervals?

Thanks,

Brian Aevermann

Devon Ryan

unread,
Aug 24, 2016, 4:34:09 AM8/24/16
to Aevermann, Brian, deep...@googlegroups.com
Hi Brian

I'm not sure what the goal is of blacklisting individual bases. We generally use regions such as those from encode. The memory blows up because I think every thread stores the BED file in memory. Can you describe more about your use case?

Devon

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "deepTools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deeptools+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aevermann, Brian

unread,
Aug 25, 2016, 1:54:46 PM8/25/16
to Devon Ryan, Aevermann, Brian, deep...@googlegroups.com
Hey Devon,

Thanks for the quick reply. I am using single bp coverage (bamCoverage output) from control samples to determine the positions I want to mask. I am currently using bedtools to merge those single bps in to intervals as to alleviate the RAM bottleneck. 

However, this segues nicely to another question. When running bamCoverage on a shared compute environment, I see multiple subprocesses are spawned during the execution of one job (leading to the every thread loading the –bl file). When running a 1bp bin on a 500 MB BAM file,  I see up to 70 subprocesses generated. Since I will be running thousands of these, is there parameter to threshold the number of subprocesses?

Thanks,

Brian Aevermann

Devon Ryan

unread,
Aug 25, 2016, 2:00:46 PM8/25/16
to Aevermann, Brian, deep...@googlegroups.com

Hi Brian,

The -p option controls the number of threads (processes in python). Anyway, making a bigWig file with 1-base wide bins doesn't require that the BED file used to blacklist things have single-base intervals. Note also that it's best to add a bit of padding to blacklisted regions (at least 50 bases on either side). The bounds for these sorts of things are always somewhat approximate and this prevents some edge-effects.

Devon

-- 
Devon Ryan, PhD
Bioinformatician / Data manager
Bioinformatics Core Facility
Max Planck Institute for Immunobiology and Epigenetics
Email: dpry...@gmail.com
Reply all
Reply to author
Forward
0 new messages