Question about downloading ChromHMM .bed data

214 views
Skip to first unread message

Gong, Yixiao

unread,
May 20, 2014, 4:09:18 PM5/20/14
to gen...@soe.ucsc.edu, Tsirigos, Aristotelis
Hi, there, 

Thanks for your tremendous contribution of developing and maintaining the UCSC genome browser. 

We currently working on downloading ChromHMM data from UCSC genome browser. But we were unable to find a way to download the data. 


We want to have all the .bed files of all cell types. Would you please direct me on this? 

Your help is greatly appreciated. 

Sincerely,
Yixiao Gong
Bioinformatician
Aifantis Laboratory
New york University Langone medical center / NYU School of Medicine
Howard Hughes Medical Institute
550 First Avenue, Smilow research building, room 1303 227 E 30th street, floor 7, cube 756c, New York, NY 10016

------------------------------------------------------------
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.
=================================

Brian Lee

unread,
May 20, 2014, 4:53:02 PM5/20/14
to Gong, Yixiao, gen...@soe.ucsc.edu, Tsirigos, Aristotelis

Dear Yixiao Gong,

Thank you for using the UCSC Genome Browser and your question about accessing data for the Roadmap ChromHMM Core Marks Track of the Roadmap Epigenomics Data hub.

This data is from a Public Hub that is hosted by WashU, http://vizhub.wustl.edu/, and the listed contact for the hub is tw...@genetics.wustl.edu. It is likely best to contact them directly to find the best way to access their data.

To access the files you are seeing displayed in the browser, you can navigate their public hub's directories and find the trackDb file that points via "bigDataUrl" statements to the location of each file displayed. Below are some short scripts you can put on the command line that will pull out those statements and create links via the relative location of the defined files. First obtains the URLs of all the files, followed by just the .bigBed and .bb files, however, not filtered for only the ChromHMM track.

curl -silent http://vizhub.wustl.edu/VizHub/hg19/trackDb_dli_edacc8_new4.txt | grep bigDataUrl | sed -e 's#^.*bigDataUrl #http://vizhub.wustl.edu/VizHub/hg19/#' 

curl -silent http://vizhub.wustl.edu/VizHub/hg19/trackDb_dli_edacc8_new4.txt | grep bigDataUrl | sed -e 's#^.*bigDataUrl #http://vizhub.wustl.edu/VizHub/hg19/#' | grep 'bb\|bigBed'

These are the binary bigBed files that can be turned into bed files with the bigBedToBed utility located in the appropriate directory here: http://hgdownload.soe.ucsc.edu/admin/exe/

Another approach is to use our Table Browser. With the hub data displayed in the browser, click the top "Tools" button and navigate to the Table Browser and then set the group to "Roadmap Epigenomics Data Complete Collection at Wash U VizHub" and then select the "Roadmap ChromHMM" track at the very bottom of tracks. You could then choose each table and get the output from each track individually as bed files.

I suggest investigating the resources for accessing this externally hosted data (Public Track Hubs are not created or maintained by UCSC). You may also be interested in these tutorials about accessing Roadmap Epigenomics Data: http://www.genome.gov/27555330

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group



--


Daofeng Li

unread,
May 21, 2014, 12:05:40 AM5/21/14
to Brian Lee, Gong, Yixiao, gen...@soe.ucsc.edu, Tsirigos, Aristotelis, Ting Wang
Hi all,

Thanks Brian for the detailed answers regarding the chromHMM data from Roadmap.
I just want to mentioned that the Roadmap data are also accessible from the WashU EpiGenome Browser at: http://epigenomegateway.wustl.edu/browser/
And soon after the Roadmap papers get published, there should be a centralized website for data downloading :)

Best regards,

Daofeng


--

To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Gong, Yixiao

unread,
May 21, 2014, 1:06:43 PM5/21/14
to Daofeng Li, Brian Lee, gen...@soe.ucsc.edu, Tsirigos, Aristotelis, Ting Wang
Hi, Dr Lee and Dr Li,

Thanks so much for both of you. 

I was able to retrieve the data by command line that Dr Lee provided. Actually, all of the datasets that Dr Lee’s command get are the complete set of the ChomHMM data of 90 cell types, don’t really contains anything else. By now, we are happy about it. 

And I also was able to use UCSC table browser to get the specific set that we need. It’s also very convenient. 

In the future, we will explore more about the WUSHU Epi Genome Browser. It’s also very good tool. 

The only issue but not very important at this point is for the ChomHMM Core Marks data set, there is no bigDataUrl label there so I can’t get the url for that. The search result in the .txt (http://vizhub.wustl.edu/VizHub/hg19/trackDb_dli_edacc8_new4.txt) file is as follows:

track RoadmapHMM15
compositeTrack on
shortLabel Roadmap ChromHMM Core Marks
longLabel Chromatin Segmentation by HMM from Roadmap Project using Core Marks
subGroup1 cellType Cell_Type …….
subGroup2 method Method ChromHMM=ChromHMM
subGroup3 donor Donor ……..
dimensions dimX=method dimY=cellType
sortOrder cellType=+
dragAndDrop on
visibility dense
priority 21
type bigBed 9

I wonder if we have a way to get it too. 

Again, thank you very much for your quick respond and useful, detailed suggestion. And we really appreciate your effort that contribute to the scientific society. 

I wish you all the best in the future and have a great day. 

Best,

Yixiao Gong
Bioinformatician
Aifantis Laboratory
New york University Langone medical center / NYU School of Medicine
Howard Hughes Medical Institute
550 First Avenue, Smilow research building, room 1303 227 E 30th street, floor 7, cube 756c, New York, NY 10016

Steve Heitner

unread,
May 21, 2014, 1:57:56 PM5/21/14
to Gong, Yixiao, Daofeng Li, Brian Lee, gen...@soe.ucsc.edu, Tsirigos, Aristotelis, Ting Wang

Hello, Yixiao.

The “compositeTrack” label in the second line of the stanza indicates that “RoadmapHMM15” is actually a container for subtracks.  The subtracks contain data, but the container does not, so there would be no bigDataUrl associated with the container.

Please contact us again at gen...@soe.ucsc.edu if you have any further questions. 
All messages sent to that address are archived on a publicly-accessible Google Groups forum.  If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

---
Steve Heitner
UCSC Genome Bioinformatics Group

--

Reply all
Reply to author
Forward
0 new messages