Reg. RefSeq gene list (hg19 coordinates)

1,230 views
Skip to first unread message

Swaminathan, Shanker

unread,
Feb 26, 2013, 5:43:50 PM2/26/13
to gen...@soe.ucsc.edu

Dear Mam/Sir

 

I would like to perform gene-based association analyses in PLINK for all RefSeq genes (hg19 coordinates), for which I would like a file which contains chromosome, start and end positions, and gene symbol of each gene. I have attached an example file with a few genes in hg18 coordinates (please see attached). If the gene contains overlapping isoforms, I would like to merge all isoforms to create a single entry for the gene and if the isoforms are not overlapping, I would like to keep them separate as duplicates of the gene. Could you please tell me if it would be possible to use the Table Browser or if there was another way to get this file?

 

Thank you very much for your help. I look forward to your early reply.

 

Thank you

Yours sincerely

Shanker Swaminathan

 

Postdoctoral Fellow, Imaging Genomics Laboratory

Department of Radiology and Imaging Sciences

Indiana University School of Medicine

 

glist-hg18_sample.txt

Brian Lee

unread,
Feb 27, 2013, 7:09:24 PM2/27/13
to Swaminathan, Shanker, gen...@soe.ucsc.edu
Dear Shanker Swaminathan,

Thank you for using the UCSC Genome Browser and your question about
using the table browser to obtain hg19 gene coordinates.

You can use the table browser to pull the start and end positions and gene
symbol from RefSeq, but the table browser can not perform the additional
steps you desire. You may be pleased to learn, however, that you can use
Galaxy's tools, https://main.g2.bx.psu.edu, to perform complex steps
to merge your data.

If you are unfamiliar with Galaxy, their tutorial video may be a good
place to start: https://main.g2.bx.psu.edu/u/aun1/p/galaxy101. If you
have any questions regarding Galaxy tools, please direct them to their
help desk: http://wiki.galaxyproject.org/Support.

1. Access the table browser from Galaxy's website, https://main.g2.bx.psu.edu,
click on the "Get Data" link in the top left and then the UCSC Main
table browser link.
Use the following settings:

Clade: Mammal
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Genes and Gene Prediction Tracks
Track: RefSeq Genes
Table: refGene

2. Be sure the region is set to genome.

3. Select for Output Format: "Selected Fields from Primary and Related Tables".

4. The box next to "Send output to Galaxy" should be checked, click
"get output".

5. Now select the fields "chrom" "cdsStart" "cdsEnd" and "name2" from
the refGene table and click "done with selection" then "send query to
galaxy".

6. On the right side of your Galaxy session you can now see your data
by clicking the eye icon, you will have something like:
chr1 48999844 50489468 AGBL4
chr1 16767256 16785385 NECAP2
chr1 16767256 16785385 NECAP2
chr1 16767256 16785491 NECAP2
Note that isoforms here have the same name, because we selected the
"name2" field which is not a unique identifier.

7. From Galaxy, on the left hand side of your Galaxy session you can
select "Operate on Genomic Intervals" and then click "Merge".
Unfortunately you may merge two or more genes together if they overlap
with this operation.

8. Set your input data to the target and select "Execute".

9. You now will have a new data set in Galaxy that will look like the following:
chr21 30248706 30257667
chr21 30302769 30365264
You may want to inquire with Galaxy if there is way to retain the
name2 column information, or if there is a better way to use their
tools to get your desired results:
http://wiki.galaxyproject.org/Support

10. One way to bring back the gene names is that you can join this
table to the original data. Under "Operate on Genomic Intervals" click
"Join", setting the merge output first and the original data second,
you will get output like:
chrY 9195451 9368097 chrY 9304609 9307170 TSPY1
chrY 9195451 9368097 chrY 9236075 9368097 TSPY4
chrY 9195451 9368097 chrY 9195451 9218292 TSPY4
chrY 9195451 9368097 chrY 9195451 9198014 TSPY8
chrY 9195451 9368097 chrY 9236075 9238638 TSPY3
chrY 9195451 9368097 chrY 9215730 9238638 TSPY4
This shows the merge information on the left and the various isoforms
on the right. You can see in the above example here there are
overlapping regions that were merged but have different gene names.

11. You could select Galaxy's "Text Manipulation" tools and the "Cut"
tool to pull out the first three columns and the last name column to
result with information about the merge such as this:

chrY 9195451 9368097 TSPY1
chrY 9195451 9368097 TSPY4
chrY 9195451 9368097 TSPY4
chrY 9195451 9368097 TSPY8
chrY 9195451 9368097 TSPY3
chrY 9195451 9368097 TSPY4

One last note, you may want to use the UCSC genes knownCanonical
table, they are genes that are clustered based on if they have
overlapping exons, then within a cluster, the gene with the highest
number of coding bases is chosen as the representative canonical gene.

1. Navigate to the table browser, and make the following selections

Clade: Mammal
Genome: Human
Assembly: Feb. 2009 (GRCh37/hg19)
Group: Genes and Gene Prediction Tracks
Track: UCSC Genes
Table: knownCanonical

2. Be sure to have the region set to "genome".

3. Select "Selected fields from primary and related tables" in the
Output Format, and click "get output".

4. Select "chrom" "chromStart" "chromEnd" from the top knownCanonical
table, and also "geneSymbol" from the next "hg19.kgXref" table.

5. Click the "get output" button, you will get output like:

chr1 367658 368597 OR4F16
chr1 420205 421839 BC036251
chr1 566092 566115 JA429830
chr1 566134 566155 JA429831
chr1 566239 566263 JA429505

Thank you again for your inquiry and using the UCSC Genome Browser,
if you have further questions please feel free to contact the mailing list
again at gen...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group
> --
>
>
>

Swaminathan, Shanker

unread,
Feb 27, 2013, 9:09:56 PM2/27/13
to Brian Lee, gen...@soe.ucsc.edu
Dear Dr. Lee

Thank you very much for the detailed clarification. I will take a look at the options you suggested and will let you know if I have any questions.

Thank you
Yours sincerely
Shanker Swaminathan

Reply all
Reply to author
Forward
0 new messages