extracting genomic coordinates for some genes

4,081 views
Skip to first unread message

varun gupta

unread,
Mar 12, 2013, 4:13:15 PM3/12/13
to gen...@soe.ucsc.edu
Hi Everyone

I have list of some gene names. The names are ref genes.

All i want to generate is a file such as this
chromosome   start_of_gene     end_of_gene  gene_name   strand

For example i have a gene
PDCD11
Its coordinates are
chr10   105156412   105206019  PDCD11  +

How can i get this for 20 - 30 genes all together.

These are ref genes.

Regards
VARUN

Brian Lee

unread,
Mar 12, 2013, 6:42:56 PM3/12/13
to varun gupta, gen...@soe.ucsc.edu
Dear Varun,

Thank you for using the UCSC Genome Browser and your question about
obtaining a list of coordinates for your gene names.

You can use the table browser to obtain this output by taking
advantage of the "identifiers (names/accessions)" section.

1. Navigate to the table browser by clicking "Tables" in the top bar
from www.genome.ucsc.edu and make the following selection:

clade: mammal
genome: human
assembly: hg19
group: Genes and Gene Prediction Tracks
track: RefSeq Genes
table: refGene
region: genome

2. Click "paste list" under identifiers, for example:

SIRT1
PDCD11

3. Set output to "selected fields from primary and related tables",
click "get output".

4. Select your desired fields by checking the boxes next to "chrom",
"strand", "txStart", "txEnd" and "name2", and click "get output".

5. You will have output for every RefSeq gene that has the related
name you are seeking, some may have multiple matching entries because
of splice variants for example:
#chrom strand txStart txEnd name2
chr10 + 69644938 69678147 SIRT1
chr10 + 69644426 69678147 SIRT1

If you would like only one set of coordinates per identifier, and do
not require RefSeq coordinates, you can redo the above steps setting
the track to "UCSC Genes" and the table to "knownCanonical" and
electing the fields "chrom", "chromStart", "chromEnd" and "geneSymbol"
from the hg19.kgXref table (be sure to remember to paste your gene
list in the identifier section again). To add the strand information,
scroll down to the Linked Tables section and select the "kgProtAlias"
table and "allow selection from checked tables" at the bottom of the
page, and then select "strand" from the "hg19.kgProtMap2 fields"
section. You will get output such as:

#hg19.knownCanonical.chrom hg19.knownCanonical.chromStart hg19.knownCanonical.chromEnd hg19.kgProtMap2.strand hg19.kgXref.geneSymbol
chr10 69644426 69678147 + SIRT1

The knownCanonical table is often the longest isoform of the related
UCSC gene group.

Thank you again for your inquiry and using the UCSC Genome Browser, if
you have further questions please feel free to contact the mailing
list again at gen...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group
> --
>
>
>
Reply all
Reply to author
Forward
0 new messages