visualize coverage plot in Genome Browser

836 views
Skip to first unread message

David

unread,
Jun 28, 2010, 12:08:04 PM6/28/10
to bedtools-discuss
Hi list,

I am having trouble loading data to Genome Browser. I am after a
coverage plot where I can see all the reads aligning to the genome. I
have created a BED file from my ELAND alignment that is sorted. I then
used the genomeCoverageBed utility as follows

genomeCoverageBed s_1_sorted.bed -g mm9.genome > s_1_cov.bedg

This is a snippet of my resulting s_1_cov.bedg

chr1 3001004 3001033 1
chr1 3001033 3001039 3
chr1 3001039 3001042 4
chr1 3001042 3001071 3
chr1 3001071 3001077 1
chr1 3001093 3001131 1
chr1 3002112 3002150 1
chr1 3002314 3002352 1
chr1 3002382 3002420 2
chr1 3002772 3002810 1


I then moved the file to an http location and used a track line like
the following

track type=bedGraph bigDataurl=http://myorganisation.org/download/
s_1_cov.bedg

But I get nothing at all when I click on submit. Can anyone tell me
what I am doing wrong?

Thanks for your help,

Dave

Aaron Quinlan

unread,
Jun 28, 2010, 12:47:51 PM6/28/10
to bedtools...@googlegroups.com
Hi David,

I've not used the new "bigData" features on the browser, but my quick glance at the examples suggests that perhaps bedGraph is not supported. They examples are for BED and Wiggle only. That said, it seems unlikely that bedGraph would not be supported as well. I assume you've tested to make sure that a snippet of your example works as a "normal" custom track?

_Aaron

Assaf Gordon

unread,
Jun 28, 2010, 1:13:48 PM6/28/10
to bedtools...@googlegroups.com, david_...@hotmail.com
Hi David,

You have two options with BedGraph data:

1. Upload the file as a textual file into the Genome Browser.
In that case, you'll need to add the following line as the first line of
the file:
====
track type=bedGraph name="Hello World"
====

See more details here:
http://genome.ucsc.edu/goldenPath/help/bedgraph.html

The textual format does NOT use the "bigDataUrl", and so you'll have to
upload the entire file as a custom track (quite slow).


2. Convert your bedgraph file to a BigWig file -
which is a binary representation of a Wiggle/BedGraph file.
This format allows you to store your file on an HTTP/FTP server and use
the bigDataUrl feature.

BigWig is explained here:
http://genome.ucsc.edu/goldenPath/help/bigWig.html

You'll need to download a program called bedGraphToBigWig from the UCSC
website. The program is available here:
http://hgdownload.cse.ucsc.edu/admin/exe/


Assuming your intervals are in INPUT.BED,
Your commands would look something like:

===
sort -k1,1 < INPUT.BED > sorted.bed
genomeCoverageBed -bg -g mm9.genome -i sorted.bed > output.bedgraph
bedGraphToBigWig output.bedgraph mm9.chromsize output.bw
===

Then, the file "output.bw" is your binary BigWig file, and can be used as:
===
track type=bigWig bigDataUrl=http://your.server.edu/output.bw
===

Note:
the "mm9.chromsize" file used for "bedGraphToBigWig" is very similar to
BEDTool's "genome" file, but it contains three columns (the third won't
be used in your case).
You can download it from:
http://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/chromInfo.txt.gz


Hope it helps,
-gordon

David Arteta

unread,
Jun 29, 2010, 6:34:28 AM6/29/10
to gor...@cshl.edu, bedtools...@googlegroups.com
Dear Gordon

thanks very much for your help, but the results are not quite what I am looking for. See two examples

- in Barski et al (2007) High-resolution profiling of histone methylations in the human genome. Cell 129:823-837. PMID: 17512414. Please have a look at Figure 7, you can see that the shape of the enrichment areas is shown. This article is available online

- in Rozowsky et al (2009) PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol 27:66-75. PMID 19122651. Please have a look at Figure 4.

I am just getting a bit lost with so many formats!

Cheers,

Dave


Get a free e-mail account with Hotmail. Sign-up now.

Assaf Gordon

unread,
Jun 29, 2010, 10:51:49 AM6/29/10
to David Arteta, bedtools...@googlegroups.com
Hello David,

Both of the figures you mention depict "Coverage" or "Wiggle-Gram" data
(i.e. some value per nucleotide position visualized as a line graph).

BEDTools' genomeCoverageBed could be the first step in producing those
kind of plots.

First, there's the technical aspect:
Wiggle (wig), BedGraph or BigWig are all file formats containing
essentially the same information, in slightly different representation.
Which ever format you choose depends on your needs:
1. wig/bedgraph are textual, bigWig is binary
2. wig/bedgraph requires uploading the entire track, bigWig requires
your own FTP/HTTP server.
3. wig/bedgraph allows multiple tracks in one file, bigWig doesn't
4. wig has more options (variable step, etc.) but is slightly more
complicated to handle.
5. wig/bedgraph can be generated with all common text-processing
languages (perl/python/etc.), bigWig requires Jim Kent's program.


Second,
There's the content of what you put in those files, and I this
might be what's puzzling you.
genomeCoverageBed takes the reads your have from a BED file (probably
after mapping some FASTA/FASTQ file) and calculates the
coverage-per-nucleotide. That's all it does - nothing more, nothing less.
It is quite possible to you need to post-process the BedGraph file to
filter interesting regions, or pre-process the BED file (before running
genomeCoverageBed) to remove regions that do not pass some threshold.

An entire different method could be to generate the coverage
(Wig/BedGraph) file with some program - by running a "window
calculation" (e.g. "windowBed" like) or some other peak-detection.
This would mean that the coverage information is not directly derived
from your BED intervals, but from some higher-level processing.

Then again,
I could be barking at the wrong tree here - you didn't explain what is
missing from your tracks that you see in those two published papers.

-gordon


David Arteta wrote, On 06/29/2010 06:34 AM:
> Dear Gordon
>
> thanks very much for your help, but the results are not quite what I
> am looking for. See two examples
>
> - in Barski et al (2007) High-resolution profiling of histone
> methylations in the human genome. Cell 129:823-837. PMID: 17512414.
> Please have a look at Figure 7, you can see that the shape of the
> enrichment areas is shown. This article is available online
>
> - in Rozowsky et al (2009) PeakSeq enables systematic scoring of
> ChIP-seq experiments relative to controls. Nat Biotechnol 27:66-75.
> PMID 19122651. Please have a look at Figure 4.
>
> I am just getting a bit lost with so many formats!
>
> Cheers,
>
> Dave
>

> ------------------------------------------------------------------------


>
>
>
Get a free e-mail account with Hotmail. Sign-up now.

> <http://clk.atdmt.com/UKM/go/197222280/direct/01/>

Reply all
Reply to author
Forward
0 new messages