Hello, Raul.
Thank you for sharing this comparison with us. It’s good to see concrete examples that show overwhelming agreement between our tool and NCBI’s tool.
It appears that the disagreements on chromosome 15 are the result of a couple of high-identity segmental duplications (chr15:20,935,076-21,034,034 <--> chr15:21,941,706-22,040,711 and chr15:21,033,446-21,199,563 <--> chr15:22,044,595-22,210,800). In these cases, NCBI’s mappings look better than ours in terms of similarity between hg18 and hg19.
One good way to double-check the results of both tools is to view at least a couple hundred base pairs on either side of the SNP in hg18, get the DNA for that region, and then blat that sequence on hg19 to view the results. In most of the disagreements, the UCSC coordinate is the top blat hit, but the NCBI coordinate is a very close second.
Most of the chromosome 4 and all of the chromosome 17 disagreements were the result of NCBI listing haplotype chromosomes. For all of the haplotype instances that I checked, the haplotype chromosome was actually the top blat hit, but liftOver preferentially selects the main chromosomes over the haplotype chromosomes (in general, there is a far greater number of annotations on the main chromosomes versus the alternate chromosomes; in the case of SNPs, this may not always produce the intended results) which was why there was a disagreement between UCSC’s liftOver and NCBI’s remap tool.
Please contact us again at gen...@soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
---
Steve Heitner
UCSC Genome Bioinformatics Group
--