Hi Mohan,
Thank you for your question about using the Data Integrator to
extract information from more than 5 data sources at a time.
Unfortunately this is not currently possible with the Data Integrator,
but if you are up for some scripting it may be possible. This will
require a Unix like command line and the UCSC utility bigWigToBedGraph
. You can download the appropriate bigWigToBedGraph
for your OS here:
http://hgdownload.soe.ucsc.edu/admin/exe/
The first step is to get a handle on the data we will be examining. Save the Roadmap trackDb.txt as a local file:curl -o trackDb_dli_edacc9_3.txt http://vizhub.wustl.edu/VizHub/hg19/trackDb_dli_edacc9_3.txt
Now examine the relevant lines of the files:grep -E 'shortLabel |bigDataUrl ' trackDb_dli_edacc9_3.txt | less
Once inside of less
, search for DNA Methylation (the name of the track of interest). You should see lines like the following:
shortLabel DNA Methylation shortLabel MeDIP_Coverage bigDataUrl GSM941726.bigWig shortLabel PFK MeDIP 02 92 bigDataUrl GSM941727.bigWig shortLabel PFM MeDIP 02 93
You can then extract data from this file with the utility bigWigToBedGraph
:bigWigToBedGraph
-chrom=chr6 -start=152124218 -end=152168456
http://vizhub.wustl.edu/VizHub/hg19/GSM941726.bigWig
GSM941726_ESR1.bedGraph
This line creates a bedGraph file with the following output:
chr6 152124218 152124220 4 chr6 152124220 152124240 1 chr6 152124240 152124440 0 chr6 152124440 152124480 2 chr6 152124480 152124520 3 ... ... ...
You can then write a script that does three things:
- makes the correct URLs from the relevant bigDataUrl fields
- for each url for each desired region, use bigWigToBedGraph to create a corresponding bedGraph file
- for each bedGraph file created in step 2, aggregate the score field
Please keep in mind that coordinates in UCSC tools and almost all UCSC file formats are 0-based, half open; start is 0-based and end is 0-based but points to the base after the end of the region. So 1-based fully-closed chr6:152,124,219-152,168,456 becomes -chrom=chr6 -start=152124218 -end=152168456 for UCSC command line tools.
The advantage of creating this script is that you can run this script for any region of interest, and examine data from anywhere in the Roadmap hub.
Please let us know if you have any further questions!
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Christopher Lee
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.