genomecoverage histogram

1,827 views
Skip to first unread message

genbio64

unread,
Mar 5, 2010, 1:51:24 PM3/5/10
to bedtools-discuss
When I run the genomecoverage tool in BEDTools it tells me that it
makes a histogram. Do I need to pipe the output to a different
program?

- Nick

Pratap, Abhishek

unread,
Mar 5, 2010, 2:02:03 PM3/5/10
to bedtools...@googlegroups.com
Aaron can better reply to this but I see the output is a binned and could be used to plot histograms using R/python basically anywhere you get a good plotting lib. May there is already a plotting tool included which I am not aware of.

-Abhi

genbio64

unread,
Mar 5, 2010, 1:57:50 PM3/5/10
to bedtools-discuss
Sorry, I was cut off before finishing that. I meant to say, in the
BEDTools manual it tells me the program output is a histogram. I can
pipe the output to a text file but to visualize the histogram do I
need to output it to another program?

- Nick

Aaron Quinlan

unread,
Mar 5, 2010, 2:14:35 PM3/5/10
to bedtools...@googlegroups.com
Hey Nick,

Yeah, the idea is that you would load the histogram into R/MatLab/GnuPlot/Octave/MatPlotLib/Excel(ack!) or your whatever your favorite stats/plotting package is.

Here's an example:

1. You ran genomeCoverageBed and wrote the output to covhist.txt
2. Grep for the coverage for the entire genome (i.e., not each discrete chromosome):
$ grep ^genome covhist.txt > genomecovhist.txt
<SNIPPET>
genome 0 288714051 2725765481 0.10592
genome 1 171838807 2725765481 0.0630424
genome 2 164159404 2725765481 0.0602251
genome 3 152507904 2725765481 0.0559505
genome 4 140450942 2725765481 0.0515272
genome 5 129628433 2725765481 0.0475567
genome 6 119880747 2725765481 0.0439806
genome 7 110903668 2725765481 0.0406872
genome 8 102771021 2725765481 0.0377035
genome 9 95602315 2725765481 0.0350736
...
<SNIPPET>

3. Plot the cumulate coverage distribution for the genome (an example with R)

# Load our hist
cov = read.table('genomecovhist.txt');

# Create a cumulative distribution from the "raw" hist (truncate at depth >=50)
cov_cumul = 1 - cumsum(cov[,4]);

# Create a plot of the cumul
plot(cov[2:51,1], cov_cumul[1:50], col='darkgreen', type='b', lwd=2
xlab="Depth", ylab="Fraction of genome >= depth",
xaxt="n",
yaxt="n"
)

# Draw the axis labels for the x (1) and y (2) axes
axis(1,at=c(1,5,10,15,20,25,30,35,40,45,50))
axis(2,at=c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0))

# add dashed vertical lines.
abline(v=c(1,5,10,15,20,25,30,35,40,45,50),lty=3,col="grey"

# add dash horizontal lines
abline(h=c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0),lty=3,col="grey"


Your output should look something like this (bugs may exist above, just a rough example):

Pasted Graphic.pdf

Aaron Quinlan

unread,
Mar 5, 2010, 2:22:48 PM3/5/10
to bedtools-discuss
Apparently attachments are disallowed, but I'm sure you can imagine
what the output "should" look like.

> Aaron
>
> Aaron Quinlan, Ph.D.
> NRSA Postdoctoral Fellow
> Hall Laboratory
> University of Virginia
> Biochem. & Mol. Genetics
> aaronquin...@gmail.com

Aaron Quinlan

unread,
Mar 5, 2010, 2:31:49 PM3/5/10
to bedtools-discuss
I placed the expected output here (covhist.pdf):
http://groups.google.com/group/bedtools-discuss/files

genbio64

unread,
Mar 5, 2010, 3:26:44 PM3/5/10
to bedtools-discuss
Aaron,
Thanks. In your example, what do the numbers represent after
genome?

- Nick

> Aaron
>
> Aaron Quinlan, Ph.D.
> NRSA Postdoctoral Fellow
> Hall Laboratory
> University of Virginia
> Biochem. & Mol. Genetics
> aaronquin...@gmail.com
>
> On Mar 5, 2010, at 1:57 PM, genbio64 wrote:
>

Aaron Quinlan

unread,
Mar 5, 2010, 6:24:07 PM3/5/10
to bedtools-discuss
This is described in Section 5.10.2 of the manual (http://
bedtools.googlecode.com/files/BEDTools-User-Manual.pdf).

Aaron

Aaron Quinlan

unread,
Jun 7, 2013, 10:04:16 AM6/7/13
to bedtools...@googlegroups.com
Hi,

This is a tough question to answer without you showing exactly what you did.  Could you provide the exact R commands you used, as well as the exact output for the "genome" lines in the genomeCoverageBed output (i.e., show us the contents of genomecovhist.txt after running 'grep ^genome covhist.txt > genomecovhist.txt').

- Aaron





On Jun 4, 2013, at 4:29 AM, Meher <meha...@gmail.com> wrote:

Hi,

I have used the below Rscript to plot the output of genomeCoverageBed and ended up with a different plot as shown in the attachment. Could you explain why this plot differs from Pasted Graphic.pdf?

--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Tessa

unread,
Feb 25, 2014, 2:54:25 PM2/25/14
to bedtools...@googlegroups.com
Hi Nick (and Aaron),

Ran across this looking for coverage plot info (thanks for the genomecov, Aaron!), but thought I'd clear this up while I was here, in case it is of help to anyone else. Nick's columns are likely just shifted over one -- I changed the columns in the code below to reflect code that should work for him. The rest of the code remains as Aaron wrote it.

best, 

Tessa


# Create a cumulative distribution from the "raw" hist (truncate at depth >=50)
cov_cumul = 1 - cumsum(cov[,5]); 

# Create a plot of the cumul
plot(cov[2:51,2], cov_cumul[1:50], col='darkgreen'type='b'lwd=2
 xlab="Depth"ylab="Fraction of genome >= depth",
 xaxt="n",
 yaxt="n"
 )

Aaron Quinlan

unread,
Feb 25, 2014, 4:19:55 PM2/25/14
to bedtools...@googlegroups.com
Thanks much for following up on this, Tessa. I suspect you are right.
Reply all
Reply to author
Forward
0 new messages