Inversion of SNP alleles (ref vs alt) across genome assemblies

37 views
Skip to first unread message

Sofia Pinto

unread,
May 25, 2016, 11:14:04 AM5/25/16
to gen...@soe.ucsc.edu
Hi,

I am using the executable version of the liftOver tool.
I performed the liftOver of some SNPs from GRCh38 to GRCh37 and everything went ok with the liftover of the positions.
However, in some of the SNPs the alleles were inverted, meaning that the reference/alternative allele in 37 was the alternative/reference allele in 38.

As an example:
(37) rs1985842, 22, 42523409, G, T
(38) rs1985842, 22, 42127407, T, G

I would like to know if there is a strategy to deal with such cases?
Are there any scripts/tools that can be used to handle the inversion of the alleles after liftOver?

Thank you for your attention.

Best regards,

Sofia Pinto, PhD
Bioinformatics Specialist
Coimbra Genomics S.A.

skype: fiapinto

Matthew Speir

unread,
May 31, 2016, 12:17:46 PM5/31/16
to Sofia Pinto, gen...@soe.ucsc.edu
Hello Sofia,

Thank you for your question about using the command-line LiftOver utility to convert SNP coordinates.

If you have the rsIDs for all of your SNPs, I would recommend just pulling the hg38 coordinates from the dbSBP "snp" tables in hg38. The most recent dbSNP release that we have available for hg38 in the UCSC Genome Browser is snp146: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=snp146. You can download the data from our downloads server, http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/snp146.txt.gz, and then use a UNIX command like so:

    zgrep -Fwf myRsIds.txt hg38/snp144.txt.gz > hg38/mySnps.txt'

to extract all of the information for each SNP in a file such as "myRsIds.txt".

Also, please note that the command-line "LiftOver" utility does require a license for use by a for-profit company or corporation. You can read the full license here: https://genome-store.ucsc.edu/media/eula/2014/09/16/LiftOver-EULA.pdf. You can purchase a license for LiftOver from our store: https://genome-store.ucsc.edu/.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Sofia Pinto

unread,
Jun 3, 2016, 11:49:55 AM6/3/16
to Matthew Speir, gen...@soe.ucsc.edu
Hello Matthew,

So you’ve confirmed what I had in mind: to use some SNP file in the same assembly version as the target assembly for liftOver to extract the alleles information for the SNPs of interest.

Thanks for the help!

Sofia Pinto
Reply all
Reply to author
Forward
0 new messages