Regards,
Joseph
Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org
CONFIDENTIALITY NOTICE: This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.
I suspect that your issue is that you are running out of memory and are "swapping". CoverageBed can use a substantial amount of memory when there are many intervals in the B file that have deep coverage from the A file.
Have you checked to see what the memory usage is by inspecting "top" or the hokey activity monitor on Mac?
Best,
Aaron
While not a direct solution, perhaps splitting your data by chromosome would speed things up (both in terms of memory, and in the ability to parallelize things).
To split your BAM file by chromosome, you can use "bamtools split" ( bamtools here: https://github.com/pezmaster31/bamtools ) .
To split your BED file by chromosome, this simple AWK script will create "my.chrNNN.bed" file for each chromosome:
awk '{ print >> "my." $1 ".bed" }' my.bed
This way you can run your pipeline (bedtools coverage + sort + groupby) on each chromosome independently, which should make things faster.
-gordon
Regards,
Joseph
Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org
Thanks,
Aaron
Regards,
Joseph
Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org
On Wed, 29 Feb 2012 18:37:43 -0500
That is indeed surprising. If you could post your files somewhere and privately let me know where to find them, I can try to take a look.
Best,
Aaron
awk '{ print >> "TFBS." $1 ".bed" }' TFBS.bed
awk: syntax error at source line 1
context is
{ print >> "TFBS." >>> $ <<< 1 ".bed" }
awk: illegal statement at source line 1
Regards,
Joseph
Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org
Regards,
Joseph
Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org
On Wed, 29 Feb 2012 19:14:54 -0500
Please try the following:
awk '{ file = "TFBS." $1 ".bed" ; print >> file }' TFBS.bed
Regards,
Joseph
Joseph M. Dhahbi, PhD
Childrens Hospital Oakland Research Institute
5700 Martin Luther King Jr. Way
Oakland, CA 94609
USA
Ph.(510)428-3885 EXT.5743
Cell.(702)335-0795
Fax (510)450-7910
jdh...@chori.org
On Wed, 29 Feb 2012 19:33:27 -0500