Gene name of Promoter regions

126 views
Skip to first unread message

Antje Biering

unread,
Dec 9, 2013, 5:26:32 AM12/9/13
to gen...@soe.ucsc.edu
Dear UCSC Genome Bioinformatics Group,

I'm a PhD-Student and for one of my projects I need to find out the promoter regions of each gene.

Therefore I used the UCSC Genome Browser to get all promotor regions of the whole genome with settings: track = UCSCGenes, table = knownGenes and output format = BED in the Table Browser as well as clicking the upstream = 2000 in the output area.

The resultant file I got was unfortunately a little confusing.

There were different names for the same promoter regions like here:
chr1    9873    11873    uc001aaa.3_up_2000_chr1_9874_
f    0    +
chr1    9873    11873    uc010nxr.1_up_2000_chr1_9874_f    0    +
chr1    9873    11873    uc010nxq.1_up_2000_chr1_9874_f    0    +

Anyway, I continued with "translating" the names into the real gene-names, using the file I got from the browser with track = UCSCGenes and table = kgAlias.

It was confusing to see that in this file some of the ID-names got the same gene-name, e.g. WASH7P.

#kgID    alias
uc001aaa.3    DDX11L1
uc001aaa.3    NR_046018
uc001aaa.3    uc001aaa.3
uc001aac.4    BC063459
uc001aac.4    WASH7P
uc001aac.4    uc001aac.4
uc001aae.4    BC053987
uc001aae.4    WASH7P
uc001aae.4    uc001aae.4
uc001aah.4    NR_024540
uc001aah.4    WASH7P
uc001aah.4    uc001aah.4
uc001aai.1    AY217347
uc001aai.1    WASH7P
uc001aai.1    uc001aai.1


Could you please tell me, what I did wrong? And how I can get the correct gene names for the promoter regions, and why there are different ID-names for the same regions?

(I already tried the FAQ "Linking gene name with accession number". They described getting the names for RefSeqGenes and KnownGenes, but these didn't fit to my data either)

Thanks a lot!

Best wishes,
Antje Biering

Brian Lee

unread,
Dec 9, 2013, 6:24:49 PM12/9/13
to Antje Biering, gen...@soe.ucsc.edu
Dear Antje Biering,

Thank you for using the UCSC Genome Browser and your question about UCSC Gene's and gene symbols.

You are on the right track to look for a table that connects the UCSC gene identifiers with the more common gene symbol. Rather than the kgAlias table, you can use the knownGene cross reference table, kgXref.

You can select this output from the table browser, http://genome.ucsc.edu/cgi-bin/hgTables, in the following fashion.

Clade: Mammal
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Genes and Gene Prediction Tracks
Track: UCSC Genes
Table: knownGene

1. Change the "output format:" to "selected fields..." and then click "get output".
2. Click the box next to "name" for the hg19.knownGene region.
3. Click the box next to "geneSymbol" in the hg19.kgXref region.
4. Click "get output" and you will see a list such as the following:

#hg19.knownGene.name hg19.kgXref.geneSymbol
uc001aaa.3 DDX11L1
uc010nxr.1 DDX11L1
uc010nxq.1 DDX11L1
uc009vis.3 WASH7P
uc009vjc.1 WASH7P
uc009vjd.2 WASH7P
...

If you desired, you could include more information, such the knownGene coordinates (chrom, txStart, txEnd) and the kgXref description field.

Since genes like WASH7P have several transcribed variants of different lengths, they will have multiple entries in knownGene. You could use our knownCanonical table that usually has one entry per gene for the longest splice variant in your earlier step of getting BED output 2000 bp upstream to define original promoter regions.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--
 

Reply all
Reply to author
Forward
0 new messages