Controlling memory use for coverageBed with RNA-seq BAMs

davi...@gmail.com

unread,

Apr 28, 2016, 6:02:22 PM4/28/16

to bedtools-discuss

Hi bedtools team

First, fantastic work on the software. I’m trying to get per-base coverage information for one gene from BAM files with high-coverage paired-end RNA-seq data.

I have been following Eric Minikel’s helpful example and working through the docs. The problem I have is that on our cluster the maximum RAM I can access is 128GB and many ( ~60%) of the coverageBed jobs I am running are exceeding this. Is there a way to control/reduce coverageBed’s memory use for this task?

I’m using a command like:

coverageBed -split -d -a locus.bed -b sample.bam > sample.bpcoverage.txt

locus.bed is extremely simple, just the following:


5    1250000    1300000

The BAMs are indexed and I think sorted (but I didn’t do either myself).

Any suggestions?

Many thanks
Davis

Aaron Quinlan

unread,

Apr 28, 2016, 6:04:55 PM4/28/16

to davi...@gmail.com, bedtools...@googlegroups.com

Hi Davis,

If your BAM file is sorted, you should use the -sorted option, as it will use fair less memory and should be much faster.

I hope this helps,

Aaron

--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

davi...@gmail.com

unread,

Apr 29, 2016, 8:48:56 AM4/29/16

to bedtools-discuss, davi...@gmail.com

Hi Aaron

Thanks for that suggestion. I'm trying that out now.

In the meantime, I produced a mean-coverage plot across the region from ~100 samples that did work with my inefficient memory usage - you can see this plot attached.

I have indicated known exon positions in orange and see pretty high coverage values across introns. Is this:

a) an artefact of how bedtools computes the per-base coverage, or

b) real?

If a), can I change the settings to get a better idea of coverage in introns compared to exons?

Best

Davis

davi...@gmail.com

unread,

Apr 29, 2016, 11:15:31 AM4/29/16

to bedtools-discuss, davi...@gmail.com

Hi Aaron

The sorted option, of course, does the trick and I'm embarrassed that by using the approach that you built in I went from using >100GB per job to <1GB. Don't tell my sysadmins :/

I did:

```

samtools idxstats sample.bam | cut -f1,2 > ../data/rnaseq_bams_genome_file.tsv"

```

to generate a "genome file" (I don't need to tell you this, but in case it's useful for anyone else like me who's fumbling around with BAMs).

So that's all working really nicely for me now :)

My only remaining query is about the apparent coverage across introns.

Best

Davis

Aaron Quinlan

unread,

Apr 29, 2016, 11:20:08 AM4/29/16

to davi...@gmail.com, bedtools...@googlegroups.com

Good to hear, Davis. As for the intronic coverage issue, this has recently been fixed in the repository: https://github.com/arq5x/bedtools2/issues/354

Unfortunately, this fix and others haven’t yet made their way to a formal release. Sorry about that.

If you clone the repository and rerun I think you should be in business.

Sorry,

Aaron

Davis McCarthy

unread,

Apr 29, 2016, 12:40:14 PM4/29/16

to Aaron Quinlan, bedtools...@googlegroups.com

OK, no worries. Relieved to hear that there was a bug and I'm not flipping out.

I installed from source (completely painless, so kudos for that! If I can install from source on a cluster you know it's easy...) and reproduced my plot - it now looks great!