reference file

96 views
Skip to first unread message

Neeba Dijo

unread,
May 9, 2014, 1:08:23 AM5/9/14
to gen...@soe.ucsc.edu
Hi

I am planning to use the new assembly  GRCh38

not found any chromosome folder and phylop and phastcon


ARe u planning to provide the same soon

--
Thanks & Regards,
Neeba Sebastian

Jonathan Casper

unread,
May 9, 2014, 7:22:58 PM5/9/14
to Neeba Dijo, gen...@soe.ucsc.edu

Hello Neeba,

Thank you for your question about data for the recently released GRCh38/hg38 human genome assembly. You are right, there is no chromosome folder at that location. Due to the large number of alternate chromosomes included in the hg38 assembly, we are not providing download links for individual chromosomes in FASTA format. Instead, you can find a single compressed archive of all of the chromosome FASTA files in the following directory: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/. The file names are hg38.chromFa.tar.gz and hg38.chromFaMasked.tar.gz, and their contents are described in the README.txt file in that same directory.

As for phyloP and phastCons data, they are not available yet because we have not completely a significant multiple alignment that includes the hg38 assembly. Unfortunately, I do not have a timetable for when one will be ready - we only recently finished processing the 100-way alignment for the hg19 genome assembly.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



--


Neeba Dijo

unread,
Jul 17, 2014, 10:47:09 AM7/17/14
to Jonathan Casper, gen...@soe.ucsc.edu
Hi Jonathan,'


I have few doubts on this.

1. I  tried to view the file using commnad ' zless hg38.chromFa.tar.gz' its shows an error that it is a binary . How can I view the file?
2. you told that  "you can find a single compressed archive of all of the chromosome FASTA files in the following directory: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/. The file names are hg38.chromFa.tar.gz and hg38.chromFaMasked.tar.gz, and their contents are described in the README.txt file in that same directory."

Is there a separate file for each chromosome?
Where I can find the alternate loci sequences?





Matthew Speir

unread,
Jul 17, 2014, 2:24:35 PM7/17/14
to Neeba Dijo, Jonathan Casper, gen...@soe.ucsc.edu
Hi Neeba,

Thank you for your question about downloading the reference sequence for the GRCh38 assembly. As Jonathan mentioned, you can download the FASTA files for all of the chromosomes as a single, compressed archive, such as http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chromFa.tar.gz. After you have downloaded the file, run the command:

    tar xvzf <file>.tar.gz

This command will decompress the archive, and extract all of the files in the archive to a folder called ‘chroms’. This ‘chroms’ folder contains the individual FASTA files for all of the chromosomes, including the alternate sequences.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages