Hello, Varun.
Thank you for your question about obtaining genePred files for hg38 from the UCSC Genome Browser.
There are a few different ways to obtain genePred files from the UCSC Genome Browser, each with their own advantages and disadvantages.
A) Table Browser
You can obtain this information from the Table Browser using the following set of steps:
1.
Navigate to the Table Browser, https://genome.ucsc.edu/cgi-bin/hgTables.2. Make the following selections:
clade: Mammal
genome: Human
assembly: Dec. 2013 (GRCh38/hg38)
group: Genes and Gene Predictions
track: NCBI RefSeq
table: RefSeq Curated (ncbiRefSeqCurated)
region: genome
output format: selected fields from primary and related tables
output file: enter a file name or leave blank to view in web browser
3. Click 'get output'.
4. Under the 'Select Fields from hg19.ncbiRefSeqCurated' section, click 'check all' then uncheck the box next to 'bin'.
5. Click 'get output'.
Note that in step 2 above, you can replace the track and table, NCBI RefSeq and ncbiRefSeqCurated, with the track and table you are interested in.
B) Downloads server
Our downloads server contains dumps of almost every table on our site including the genePred tables that support the NCBI RefSeq and GENCODE tracks. The table downloads for hg38 are here:
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/. Just search for the table name of the gene track you are interested in, e.g. the ncbiRefSeqCurated table is associated with the NCBI RefSeq Curated.
Note that these file downloads include a 'bin' column, which is useful for display in the genome browser but doesn't actually have any use for analysis. You can strip out this bin column using the UNIX 'cut' command:
cut -f2- ncbiRefSeqCurated.txt > ncbiRefSeqCurated.noBin.txt
I hope this is helpful. If you have any further questions, please reply to
gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genomics Institute
Want to share the Browser with colleagues?