| from hg18 to GRCh38 | Alexandra Vatsiou | 26/02/14 08:34 | Hello, I would like to ask one question about conversion from hg18 to GRCh38. Is it possible to convert SNP coordinates from hg18 to GRCh38 with the chain file http://hgdownload-test.sdsc.edu/goldenPath/hg18/liftOver/hg18ToHg38.over.chain.gz. And if yes, what exactly is the format of the chain file?
Regards, Alexandra |
| Re: [genome] from hg18 to GRCh38 | Matthew Speir | 26/02/14 14:36 |
Hi Alexandra,
Thank you for your question about mapping SNP coordinates from hg18 to hg38. You may find the LiftOver utility useful for this task. This utility is available for UNIX-based OSes, and can be downloaded from the following location: http://hgdownload.soe.ucsc.edu/admin/exe/. Once you have downloaded the utility, you can type "./liftOver" with no arguments at the command line to see the usage statement. This usage statement also contains information on acceptable input formats for liftOver. More information on the chain format can be found on the following help page: http://genome.ucsc.edu/goldenPath/help/chain.html. One of our engineers had the following suggestions to consider when carrying out these Lift Overs: "Mapping polymorphisms correctly is challenging even when directly mapping sequence to a reference assembly. No guarantees, but something like this might help to increase the confidence in mappings: instead of mapping single-nucleotide positions, widen out the hg18 positions (for example by 50 or 100 bases upstream and downstream of the SNP) and require that the whole surrounding region maps to a single location in hg38. As with any bioinformatics process, comparing the results of different methods might provide some insight. If your SNP coordinates are from dbSNP, here are a couple additional suggestions: 1) Use hg19 SNP 138 (hg18 SNP 130 is very old now). 2) Filter out any dbSNP items that map to multiple locations in hg18/hg19. 3) Take a look at the red flags described in the snp138ExceptionsDesc table and consider excluding SNPs that have some of those conditions as well." If you choose to use the hg18 SNP coordinates, then the file referenced in your email, http://hgdownload-test.sdsc.edu/goldenPath/hg18/liftOver/hg18ToHg38.over.chain.gz, will work just fine. If you decide to use the hg19 snp138 coordinates, you will need an hg19 to hg38 liftOver file which can be found at http://hgdownload-test.sdsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz. Please note that these files are on our test server and have not undergone our standard QA process yet. I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu. Matthew Speir UCSC Genome Bioinformatics Group -- |
| Fwd: [genome] from hg18 to GRCh38 | Alexandra Vatsiou | 27/02/14 07:52 | ---------- Forwarded message ---------- From: Alexandra Vatsiou <alex....@gmail.com> Date: Thu, Feb 27, 2014 at 4:37 PM Subject: Re: [genome] from hg18 to GRCh38 To: Matthew Speir <msp...@soe.ucsc.edu> Hi Matthew, Thank you for your very informative email. I have though two more question, if it is possible to answer me. You mentioned that the chain file is sufficient to convert from one assembly to another the SNP coordinates. 1. I suppose the reference sequence is the hg18 and the query is the hg38 in the hg19ToHg38.over.chain file. Right? 2. However, I can't see how exactly I will do the conversion of SNP coordinates between the two assemblies. The chain file contains a) the size of the ungapped alignment and b) the difference between the end of this block and the beginning of the next block in the reference and the query sequence.
What I have is something like the following: SNP chr genetic dist physical dist rs12565286 1 0.000881697 711153 C G
rs11804171 1 0.0058782819 713682 A T
rs2977670 1 0.0060252705 713754 C G
rs2977656 1 0.018904623 719811 C T
rs12138618 1 0.0633989027 740098 A G
rs3094315 1 0.0665172688 742429 A G
rs3131968 1 0.0679866253 744055 A G
where the fourth column is the SNP coordinates of each SNP. For example according to the chain file what will be the coordinate of the first SNP? Look forward to your reply.
Regards, Alexandra --
Alexandra Vatsiou
PhD Student University Joseph Fourier
BP 53 Grenoble Cedex 9
France 38041
Alexandra Vatsiou
PhD Student University Joseph Fourier BP 53 Grenoble Cedex 9
France 38041
|
| Re: [genome] from hg18 to GRCh38 | Pauline Fujita | 06/03/14 15:05 | Hello Alexandra,
What operating system are you using? We recommend the liftOver utility suggested by my colleague Matt. If you need help converting your SNP data to BED format for liftOver, here is a suggestion from one of our developers: awk '$1 ~ /^rs/ {print "chr" $2, $4-1, $4, $1;}' myFile.txt > myFile.hg18.bed Then myFile.hg18.bed will look like this: chr1 711152 711153 rs12565286 chr1 713681 713682 rs11804171 chr1 713753 713754 rs2977670 If you want to keep the alleles, they can be concatenated to the name field like this: awk '$1 ~ /^rs/ {print "chr" $2, $4-1, $4, $1 "_" $5 "_" $6;}' myFile.txt > myFile.hg18.bed Then myFile.hg18.bed will look like this: chr1 711152 711153 rs12565286_C_G chr1 713681 713682 rs11804171_A_T chr1 713753 713754 rs2977670_C_G Then you can run liftOver like this: ./liftOver myFile.hg18.bed hg18ToHg38.over.chain.gz myFile.hg19.bed myFile.hg18.cantMap Or... if you don't have too many rsIDs, then you can upload myFile.hg19.bed to http://genome-preview.ucsc.edu/cgi-bin/hgLiftOver Finally, if you would like to try the suggestion of adding bases upstream and downstream to require a larger region to map cleanly, you can try this sequence: awk '$1 ~ /^rs/ {print "chr" $2, $4-1-50, $4+50, $1 "_" $5 "_" $6;}' myFile.txt > myFile.flank50.hg18.bed ./liftOver myFile.flank50.hg18.bed hg18ToHg38.over.chain.gz myFile.flank50.hg19.bed myFile.flank50.hg18.cantMap This command turns the mapped 101-bp regions back into 1-bp regions, and discards mappings to regions smaller than 101 bp: awk '{s = $2+50; e = $3-50; if (e > s) {print $1, s, e, $4;} }' myFile.flank50.hg19.bed > myFile.mapped50.hg19.bed Hopefully this gets you the data you are looking for. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messagesBest regards, Pauline Fujita UCSC Genome Bioinformatics Group http://genome.ucsc.edu > -- > |