SNPs to Gene names

241 views

Skip to first unread message

Andrey Gimenez Rivera

unread,

Mar 3, 2020, 12:05:23 PM3/3/20

to gen...@soe.ucsc.edu

Hi,

I have a long list of SNPs (rsXXXXXX) and I would like to get the names associated to each of these SNPs. Is there a way to do this using UCSC database? Something like copy paste in some window and get the list of the genes associated in an arranged way?

Thanks a lot,

Andrey.

Luis Nassar

unread,

Mar 4, 2020, 7:00:01 PM3/4/20

to Andrey Gimenez Rivera, gen...@soe.ucsc.edu

Hello Andrey,

Thank you for your interest in the Genome Browser.

The easiest way to get gene names providing a list of rsIDs would be to use the Variant Annotation Integrator (http://genome.ucsc.edu/cgi-bin/hgVai).

I'll use the following SNPs in my example:

You will want to go to hgVai (http://genome.ucsc.edu/cgi-bin/hgVai) and select Variant Identifiers from the variants menu. You can then paste up to 10,000 variants in the box, or you can increase that to 100,000 in the drop-down menu underneath the box.

You can then make various selections to add additional annotations, but if you leave everything in the default settings and click Get results you will see an output like so:

# ENSEMBL VARIANT EFFECT PREDICTOR format (UCSC Variant Annotation Integrator)
...
Uploaded Variation    Location    Allele    Gene    Feature    Feature type    Consequence    Position in cDNA    Position in CDS    Position in protein    Amino acid change    Codon change    Co-located Variation    Extra
rs2802212    chr1:11095207    A    EXOSC10    ENST00000304457.11    Transcript    intron_variant    -    -    -    -    -    rs2802212    INTRON=3/23
rs2802212    chr1:11095207    A    EXOSC10    ENST00000376936.9    Transcript    intron_variant    -    -    -    -    -    rs2802212    INTRON=3/24
rs2802212    chr1:11095207    A    EXOSC10    ENST00000460196.1    Transcript    NMD_transcript_variant    -    -    -    -    -    rs2802212    INTRON=4/4

You can change the genes track used to annotate the variants, by default it is GENCODE v32 for hg38. Also, if you add a file name in the output file field, it will prompt a download with the results instead of displaying them. You can then manipulate the file to fit your needs. Here is an example removing the header lines, extracting only rsID, position, and gene symbol, and then removing duplicates which are present due to multiple transcripts per gene:

$ grep -v "^#" rsIDannotated.txt | cut -f 1,2,4 | uniq
Uploaded Variation    Location    Gene
rs2802212    chr1:11095207    EXOSC10
rs2802212    chr1:11095207    EXOSC10-AS1
rs12140947    chr1:11119113    MTOR
rs57289760    chr1:165979715    -
rs11590988    chr1:165981040    -
rs2882087    chr1:165983890    -
rs6426950    chr1:165985150    -
rs4656470    chr1:165992153    -

If the list includes SNPs outside of gene regions, as mine does, these variants will lack gene annotations.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/B8DF6C55-0F38-41EC-96C5-F812429C55AA%40mail.mcgill.ca.

Reply all

Reply to author

Forward

0 new messages