PhastCons downloads

63 views
Skip to first unread message

Gaurav Arora

unread,
Mar 25, 2014, 3:09:54 PM3/25/14
to gen...@soe.ucsc.edu
Hi there,

I am trying to download the PhastCons7 way scores for approximately 4000 ORFs in the yeast genome. I also plan to download the scores for the intergenic regions. 

I understand that there is a limit to the data download and I know that the amount of data for the regions I am interested in exceeds the limit.

Is there a way that I can download the phastCons scores for 4000 ORFs I am interested in? The ftp downloads site has scores for the chromosomes and nor necessarily the ORFs. 

If this question has already been asked on your mailing list, please send me the link. I did look in your list but did not find a previous question similar to mine, but again I may have overlooked and I apologize in advance.

Best,
Gaurav

--
Gaurav Arora Ph.D.

Post Doctoral Fellow
Department of Biology
Georgetown University
Regents Hall, 514
37th and O Streets NW
Washington DC 20057

Office: 202-687-1867

Brian Lee

unread,
Mar 26, 2014, 2:20:46 PM3/26/14
to Gaurav Arora, gen...@soe.ucsc.edu
Dear Gaurav,

Thank you for using the UCSC Genome Browser and your question about downloading yeast phastCons scores.

If you navigate to the yeast conservation track, http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=sacCer3&g=multiz7way, you can review the track description. There you will find a link to the the phastCons information in fixed wiggle format for download: http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/phastCons7way/ Please see the displayed README with the following links for a description of the phastCons data file format:

You can also access the at the sacCer phastCons7way.txt.gz file from the downloads directory: http://hgdownload.soe.ucsc.edu/goldenPath/sacCer3/database/

First create a BED file of the ORF, which you can do with the Table Browser. Likewise you can use the Table Browser to do an intersection with the phastCons7way data.

Here are some example steps focusing on just chrI. Navigating to the Table Browser with sacCer3 selected, use the SGD Genes track and sgdGene table from the Gene Prediction group. Set position "chrI" and set output to "custom track" and click "get output". If you "get custom track in genome browser" you will see this track is all the genes on chrI, but go back to the Table Browser and enter in "output file" sdgGeneChrI.BED and change output format to "BED" and click "get output" and "get BED". Now you have a file you can use with hgWiggle (for later).

Now go to the Table Browser and set the Group to "Comparative Genomics" and track "Conservation" and table "phastCons7way" and click the "create" button next to "intersection:". Now select the group "Custom Tracks" and the "tb_sgdGene" track from the original custom track creation and click submit. Be sure to remove the output file option before clicking "get output" and you will have the data points for all these regions.

If you wish to use hgWiggle please read and follow the steps in this wiki:

After obtaining the utility, http://hgdownload.soe.ucsc.edu/admin/exe/, and obtaining the phastCons7way file, rsync -aP rsync://hgdownload.soe.ucsc.edu/goldenPath/sacCer3/database/phastCons7way.txt.gz . , and changing the name from .txt to .wig, you could run the following command on a downloaded BED file from the Table Browser of the regions of interest (here just ChrI):

hgWiggle -bedFile=sdgGeneChrI.BED phastCons7way

I suggest seeing the mailing list for additional examples and information about hgWiggle: https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/hgWiggle

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--


Reply all
Reply to author
Forward
0 new messages