Human genome Genes download

25 views
Skip to first unread message

Majid, Salma (NIH/NIAAA) [E]

unread,
Oct 30, 2020, 12:18:11 PM10/30/20
to gen...@soe.ucsc.edu

Hello

I am in need of some assistance, I am using the UCSC genome statistics for the total number of genes present in the human genome , I have to classify genes as coding, pseudo and others. I want to know how can I download the all the gene list  (approx.. 59,000) ( coding , non coding and other genes) for the whole human genome. I would appreciate your support.

Thanks

Salma Majid

NIAAA, NIH

Jairo Navarro Gonzalez

unread,
Nov 6, 2020, 2:50:34 PM11/6/20
to Majid, Salma (NIH/NIAAA) [E], gen...@soe.ucsc.edu

Hello Salma,

Thank you for using the UCSC Genome Browser and sending your inquiry. I apologize for the
delay in my response.

Using the Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables), you can filter for the
different gene types and download each of them separately. Following these instructions will
give you all of the non-coding genes for the hg38 assembly.

clade: mammal
genome: Human
assembly: Dec. 2013 (GRCh38/hg38)
group: Genes and Gene Predictions
track: GENCODE v32
table: knownGene
region: genome
output format: BED - browser extensible data
output file: non-coding_genes

Once these settings are chosen, on the "filter" line, click the "create" button. In the “Filter on
Fields from hg38.knownGene” section, there is a “Free-form query” line. On this line, enter the
following:

cdsStart=cdsEnd

Once you have entered the filtering option, click the "submit" button. After you are back on the
main Table Browser page, click the "get output" button to download the non-coding genes in BED
format. To get the protein-coding genes, change the "Free-form query" to:

cdsStart!=cdsEnd

If you would like to create a filter for the pseudogenes, on the hg38.kgXref filter section, enter the
term *pseudo* in the description field.

description does match *pseudo*

If you would like to download all annotations with the classification of the genes, you can use the
GENCODE track and selected fields from the related tables.

Track: GENCODE V32
Table: knownGene
Output format: selected fields
Fields:

hg38.knownGene.name
hg38.kgXref.geneSymbol
hg38.knownAttrs.transcriptType

Choosing these fields, you will get output such as the following:

#hg38.knownGene.name    hg38.kgXref.geneSymbol    hg38.knownAttrs.transcriptType
ENST00000456328.2    DDX11L1    lncRNA
ENST00000450305.2    DDX11L1    transcribed_unprocessed_pseudogene
ENST00000488147.1    WASH7P    unprocessed_pseudogene
ENST00000619216.1    MIR6859-1    miRNA
ENST00000473358.1    MIR1302-2HG    lncRNA
ENST00000469289.1    MIR1302-2HG    lncRNA
ENST00000607096.1    MIR1302-2    miRNA

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser

Want to share the Browser with colleagues?
Host a workshop: https://bit.ly/ucscTraining


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/DM8PR09MB72077AEAB95DFAEFE399C1B9F2150%40DM8PR09MB7207.namprd09.prod.outlook.com.
Reply all
Reply to author
Forward
0 new messages