Dear Antje Biering,
Thank you for using the UCSC Genome Browser and your question about UCSC Gene's and gene symbols.
You are on the right track to look for a table that connects the UCSC gene identifiers with the more common gene symbol. Rather than the kgAlias table, you can use the knownGene cross reference table, kgXref.
Clade: Mammal
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Genes and Gene Prediction Tracks
Track: UCSC Genes
Table: knownGene
1. Change the "output format:" to "selected fields..." and then click "get output".
2. Click the box next to "name" for the hg19.knownGene region.
3. Click the box next to "geneSymbol" in the hg19.kgXref region.
4. Click "get output" and you will see a list such as the following:
uc001aaa.3 DDX11L1
uc010nxr.1 DDX11L1
uc010nxq.1 DDX11L1
uc009vis.3 WASH7P
uc009vjc.1 WASH7P
uc009vjd.2 WASH7P
...
If you desired, you could include more information, such the knownGene coordinates (chrom, txStart, txEnd) and the kgXref description field.
Since genes like WASH7P have several transcribed variants of different lengths, they will have multiple entries in knownGene. You could use our knownCanonical table that usually has one entry per gene for the longest splice variant in your earlier step of getting BED output 2000 bp upstream to define original promoter regions.
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to
gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genome Bioinformatics Group