Hello
I am in need of some assistance, I am using the UCSC genome statistics for the total number of genes present in the human genome , I have to classify genes as coding, pseudo and others. I want to know how can I download the all the gene list (approx.. 59,000) ( coding , non coding and other genes) for the whole human genome. I would appreciate your support.
Thanks
Salma Majid
NIAAA, NIH
Hello Salma,
Thank you for using the UCSC Genome Browser and sending your inquiry. I apologize for the
delay in my response.
Using the Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables), you can filter for the
different gene types and download each of them separately. Following these instructions will
give you all of the non-coding genes for the hg38 assembly.
clade: mammal
genome: Human
assembly: Dec. 2013 (GRCh38/hg38)
group: Genes and Gene Predictions
track: GENCODE v32
table: knownGene
region: genome
output format: BED - browser extensible data
output file: non-coding_genes
Once these settings are chosen, on the "filter" line, click the "create" button. In the “Filter on
Fields from hg38.knownGene” section, there is a “Free-form query” line. On this line, enter the
following:
cdsStart=cdsEnd
Once you have entered the filtering option, click the "submit" button. After you are back on the
main Table Browser page, click the "get output" button to download the non-coding genes in BED
format. To get the protein-coding genes, change the "Free-form query" to:
cdsStart!=cdsEnd
If you would like to create a filter for the pseudogenes, on the hg38.kgXref filter section, enter the
term *pseudo* in the description field.
description does match
*pseudo*
If you would like to download all annotations with the classification of the genes, you can use the
GENCODE track and selected fields from the related tables.
Track: GENCODE V32
Table: knownGene
Output format: selected fields
Fields:
hg38.knownGene.name
hg38.kgXref.geneSymbol
hg38.knownAttrs.transcriptType
Choosing these fields, you will get output such as the following:
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser
Want to share the Browser with colleagues?
Host a workshop: https://bit.ly/ucscTraining
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/DM8PR09MB72077AEAB95DFAEFE399C1B9F2150%40DM8PR09MB7207.namprd09.prod.outlook.com.