Slow computation the mean coverage

135 views
Skip to first unread message

Joseph Dhahbi, PhD

unread,
Feb 29, 2012, 4:37:41 PM2/29/12
to bedtools...@googlegroups.com
Hi
I would like to compute the mean coverage as suggested in one of the
discussion posts.
bedtools coverage -abam my.bam -b my.bed -d | sort -k1,1 -k2,2n | groupby -g
1,2,3,4,5,6 -c 8 -o mean > my.txt
This takes a few minutes when my.bed is only 1,000 entries, but it's been
running for 12 hours and still not done yet with my.bed of 1,200,000
entries.
I am using a Mac Pro with speed of 2.66 GHz and 8 GB of Memory.
Is there a way to speed up this?
Thanks
Joseph


Regards,
Joseph

Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org
CONFIDENTIALITY NOTICE: This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.

Aaron Quinlan

unread,
Feb 29, 2012, 5:52:22 PM2/29/12
to bedtools...@googlegroups.com
Hi Joseph,

I suspect that your issue is that you are running out of memory and are "swapping". CoverageBed can use a substantial amount of memory when there are many intervals in the B file that have deep coverage from the A file.

Have you checked to see what the memory usage is by inspecting "top" or the hokey activity monitor on Mac?

Best,
Aaron

Assaf Gordon

unread,
Feb 29, 2012, 6:12:08 PM2/29/12
to bedtools...@googlegroups.com, Joseph Dhahbi, PhD
Hello Joseph,


While not a direct solution, perhaps splitting your data by chromosome would speed things up (both in terms of memory, and in the ability to parallelize things).

To split your BAM file by chromosome, you can use "bamtools split" ( bamtools here: https://github.com/pezmaster31/bamtools ) .

To split your BED file by chromosome, this simple AWK script will create "my.chrNNN.bed" file for each chromosome:
awk '{ print >> "my." $1 ".bed" }' my.bed

This way you can run your pipeline (bedtools coverage + sort + groupby) on each chromosome independently, which should make things faster.

-gordon

Joseph Dhahbi, PhD

unread,
Feb 29, 2012, 6:31:56 PM2/29/12
to bedtools...@googlegroups.com
The activity monitor shows 503 MB of Real Mem for bedtools.


Regards,
Joseph

Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org

Aaron Quinlan

unread,
Feb 29, 2012, 6:37:43 PM2/29/12
to bedtools...@googlegroups.com
Is that for the job that has been running for 12 hours, or a newly started job? If the latter, is 503Mb at a stable state, or does it continue to grow? If the former, how big is the BAM file?

Thanks,
Aaron

Joseph Dhahbi, PhD

unread,
Feb 29, 2012, 7:07:08 PM2/29/12
to bedtools...@googlegroups.com
Hi Aaron
This is a newly started job which now reached and stablized at 5.7 GB.
The BAM file is only 10 MB.


Regards,
Joseph

Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org

On Wed, 29 Feb 2012 18:37:43 -0500

Aaron Quinlan

unread,
Feb 29, 2012, 7:14:54 PM2/29/12
to bedtools...@googlegroups.com
Hi Joseph,

That is indeed surprising. If you could post your files somewhere and privately let me know where to find them, I can try to take a look.

Best,
Aaron

Joseph Dhahbi, PhD

unread,
Feb 29, 2012, 7:22:45 PM2/29/12
to bedtools...@googlegroups.com
Hi Gordon
I am trying your suggestion; the bed file I am using is called TFBS.bed.
I got an error:

awk '{ print >> "TFBS." $1 ".bed" }' TFBS.bed
awk: syntax error at source line 1
context is
{ print >> "TFBS." >>> $ <<< 1 ".bed" }
awk: illegal statement at source line 1


Regards,
Joseph

Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org

Joseph Dhahbi, PhD

unread,
Feb 29, 2012, 7:26:21 PM2/29/12
to bedtools...@googlegroups.com
Hi Aaron
I don't mind sharing the files, is there a way to upload them somewhere? I
don't have access to a server or ftp sites.


Regards,
Joseph

Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org

On Wed, 29 Feb 2012 19:14:54 -0500

Assaf Gordon

unread,
Feb 29, 2012, 7:33:27 PM2/29/12
to bedtools...@googlegroups.com
Are you using a Mac ? if so, the "awk" might be the BSD's AWK and not GNU AWK, which (I guess) doesn't support this syntax.

Please try the following:
awk '{ file = "TFBS." $1 ".bed" ; print >> file }' TFBS.bed

Joseph Dhahbi, PhD

unread,
Feb 29, 2012, 7:43:59 PM2/29/12
to bedtools...@googlegroups.com
yes, I am using a Mac; thank you, now it worked.


Regards,
Joseph

Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org

On Wed, 29 Feb 2012 19:33:27 -0500

Reply all
Reply to author
Forward
0 new messages