How to extract signal values for a limited genomic region from wiggle files?

1,641 views
Skip to first unread message

Darius Khan

unread,
May 23, 2013, 4:04:18 AM5/23/13
to gen...@soe.ucsc.edu
Hi! I'm interested in correlating data from the same track but from different genomic regions (DNase hypersensitivity data, for example). I've understood that the only solution would be to get the signal values from these regions and run an analysis on them. Would this be possible with hgWiggle - can I specify the genomic regions from which I want the data?

Brooke Rhead

unread,
May 24, 2013, 7:41:53 PM5/24/13
to Darius Khan, gen...@soe.ucsc.edu
Hi Darius,

Yes, you can use hgWiggle to extract portions of wiggle data from a
wiggle file (run the command with no arguments to see a full usage
statement). However, if you are using data from UCSC, it is very likely
that files are going to be in bigWig format:

http://genome.ucsc.edu/goldenPath/help/bigWig.html

See the bottom of the page for tools for extracting bigWig data. You
will likely want to use bigWigToBedGraph or bigWigToWig.

Note that you don't have to download an entire bigWig file to extract
portions of data from it; you can just enter the URL to the file with on
the command line with the bigWig* tools. For example, if you want to
extract data from some particular signal file located on the ENCODE
downloads page (http://genome.ucsc.edu/ENCODE/downloads.html) for a
specific region, say, from the Duke DNaseI HS signal file for 8988T
cells, located here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeOpenChromDnase/wgEncodeOpenChromDnase8988tSig.bigWig

for the region chr21:33,031,000-33,038,700, you can use a command like
this one:

bigWigToBedGraph -chrom=chr21 -start=33031000 -end=330380700
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeOpenChromDnase/wgEncodeOpenChromDnase8988tSig.bigWig
out.bedGraph

the result will be a bedGraph file limited to only
chr21:33,031,000-33,038,700.

If you use the signal files, you might be interested in the DNase peak
callers on the ENCODE Software Tools page:
http://genome.ucsc.edu/ENCODE/encodeTools.html, e.g., the HotSpot
software from the Stam lab.

Alternatively, you may want to use ENCODE peaks files rather than the
signal files in your analysis. In particular, the "DNase Clusters"
track might be useful to you, as the the data is already clustered and
the peaks already called; read about the track data here on the track
description page, here:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegDnaseClusteredV2

I hope this information is helpful. If you have further questions,
please contact us again at gen...@soe.ucsc.edu.

--
Brooke Rhead
UCSC Genome Bioinformatics Group
> --
>
>
>
Reply all
Reply to author
Forward
0 new messages