Hello Alexandra,
What operating system are you using? We recommend the liftOver utility
suggested by my colleague Matt. If you need help converting your SNP
data to BED format for liftOver, here is a suggestion from one of our
developers:
awk '$1 ~ /^rs/ {print "chr" $2, $4-1, $4, $1;}' myFile.txt > myFile.hg18.bed
Then myFile.hg18.bed will look like this:
chr1 711152 711153 rs12565286
chr1 713681 713682 rs11804171
chr1 713753 713754 rs2977670
If you want to keep the alleles, they can be concatenated to the name
field like this:
awk '$1 ~ /^rs/ {print "chr" $2, $4-1, $4, $1 "_" $5 "_" $6;}'
myFile.txt > myFile.hg18.bed
Then myFile.hg18.bed will look like this:
chr1 711152 711153 rs12565286_C_G
chr1 713681 713682 rs11804171_A_T
chr1 713753 713754 rs2977670_C_G
Then you can run liftOver like this:
./liftOver myFile.hg18.bed hg18ToHg38.over.chain.gz myFile.hg19.bed
myFile.hg18.cantMap
Or... if you don't have too many rsIDs, then you can upload
myFile.hg19.bed to
http://genome-preview.ucsc.edu/cgi-bin/hgLiftOver
Finally, if you would like to try the suggestion of adding bases
upstream and downstream to require a larger region to map cleanly, you
can try this sequence:
awk '$1 ~ /^rs/ {print "chr" $2, $4-1-50, $4+50, $1 "_" $5 "_" $6;}'
myFile.txt > myFile.flank50.hg18.bed
./liftOver myFile.flank50.hg18.bed hg18ToHg38.over.chain.gz
myFile.flank50.hg19.bed myFile.flank50.hg18.cantMap
This command turns the mapped 101-bp regions back into 1-bp regions,
and discards mappings to
regions smaller than 101 bp:
awk '{s = $2+50; e = $3-50; if (e > s) {print $1, s, e, $4;} }'
myFile.flank50.hg19.bed > myFile.mapped50.hg19.bed
Hopefully this gets you the data you are looking for. If you have any
further questions, please reply to
gen...@soe.ucsc.edu. All messages
sent to that address are archived on a publicly-accessible Google
Groups forum. If your question includes sensitive data, you may send
it instead to
genom...@soe.ucsc.edu.
Best regards,
Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
> --
>