Dear UCSC team, I intend to identify SNPs from my data using GRCh38.hg38 as reference. I searched the site for corresponding dbSNP files which can be used with this build for differentiating already known variants. The latest dbSNP files available are for hg19. However, there is a liftover file at this link (http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/). So, should I use this liftover file and then use the dbSNP files from the hg19 assembly? Or would it be better for me to use hg19 as reference? Thanking you for your inputs and recommendations. Regards, Rushiraj
Hello Rushiraj,
Thank you for your question about identifying common SNPs in your data. Ultimately it's up to you to decide which approach is going to best serve your needs, but I can give you some additional information. We are currently in the process of releasing a dbSNP track for hg38. Some time next week we expect to have a version of it available on our test server at http://genome-test.soe.ucsc.edu. Please note that this track will not have undergone any of our quality assurance checks, and that additional changes may be made prior to the public release. Whether that is a better tool for your analysis than using the lifted track from hg19 is up to you. Regarding using hg19 instead of hg38, it is certainly true that hg19 has more data mapped to it at this time. Depending on the type of analysis you are doing, that might be helpful. A good start might be to check on whether your SNPs occur in regions that have substantially changed between the hg19 and hg38 assemblies. If so, that would weigh in favor of working with hg38.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--