Big difference in SNP150 in HG19 and HG38

259 views
Skip to first unread message

Karel de Groot

unread,
Jul 6, 2018, 11:16:29 AM7/6/18
to gen...@soe.ucsc.edu

 

Dear Brian,

 

Thanks for your reply and help. I now see where the difference is from, I was looking at the hg19 file from the ftp server. The genomebrowser session was in hg38. So that explains the difference, what I do not understand is why there is a difference between hg19 and hg38 in regards to snps found for this region.

 

For example snp rs994533351 can be found in hg38, but not in hg19? Hg19 has 234M rows while hg38 has 335M rows. What is the cause of this big difference?

 

Regards,

Karel

 

 

 

From: Brian Lee [mailto:bria...@soe.ucsc.edu]
Sent: dinsdag 3 juli 2018 1:33
To: Karel de Groot
Cc: gen...@soe.ucsc.edu
Subject: Re: [genome] SNP visible in the genomebroswer, but missing from the download files

 

Dear Karel,

Thank you for using the UCSC Genome Browser and your question about snp150.txt.gz and looking for the SNPs rs970938238, rs959271763, rs994533351.

I have confirmed these are in the file, perhaps you do not have the entire 7GB file? Here is the md5sum I found when checking the file I pulled: 2c5fdc04f0cf4aff29c0de9dee988d2f snp150.txt.gz

Here is also a tip on how to look for items in the file. You can create a second file with a list of your SNPs:

$ cat list.txt 
rs970938238
rs959271763
rs994533351

And then use the zcat and grep command to pull out the lines from the file (it may take some time since it is such a large file):

zcat snp150.txt.gz | grep -Fwf list.txt > mySnps.txt

In this case, the output for mySnps.txt will be these lines:

749    chr1    21508379    21508380    rs970938238    0    +    A    A    A/G    genomic    single    unknown    0    0    near-gene-5    exact    1        1    HUMAN_LONGEVITY,    0
749    chr1    21508744    21508745    rs959271763    0    +    C    C    C/T    genomic    single    unknown    0    0    near-gene-5    exact    1        1    HUMAN_LONGEVITY,    0
749    chr1    21508747    21508748    rs994533351    0    +    C    C    C/G    genomic    single    unknown    0    0    near-gene-5    exact    1        1    HUMAN_LONGEVITY,    0


Also, in case you do MySQL queries we do have a European MySQL server:

http://genome.ucsc.edu/goldenPath/help/mysql.html
mysql --user=genome --host=genome-euro-mysql.soe.ucsc.edu -A -P 3306

Also, based on your link, you may be interested to learn you can create custom track output of your BLAT searches. 
http://genome.ucsc.edu/goldenPath/newsarch.html#050417

If you create sessions these custom tracks will persist and not disappear if you wish to return to them. Here are some steps to creating sessions (sessions on our European server will not be identical to sessions on our US or Asian servers): 
https://genome.ucsc.edu/goldenPath/help/hgSessionHelp.html#Create

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UC Santa Cruz Genomics Institute

 

 

On Mon, Jul 2, 2018 at 2:24 AM, Karel de Groot <k.de...@mlpa.com> wrote:

Hi,

 

I downloaded snp150.txt.gz from your ftp. When looking on the website using the genomebrowser (https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1%3A21508709-21508768&hgsid=679402131_nddK8azSQtvTMFQaap1LNLtaUQHp). I can see these snps  (rs970938238, rs959271763, rs994533351) that are not present in the downloaded file. What is the reason that these snps are missing in the file?

 

Thank you for your time.

 

Karel

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/42427CD789ECA84B8916740B1AAC716402DCC844B3%40NLMRCAMS04.nlmrcams.local.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

 

Matthew Speir

unread,
Jul 10, 2018, 4:46:16 PM7/10/18
to Karel de Groot, gen...@soe.ucsc.edu
Hi, Karel.

Thank you for your question about the differences between the dbSNP 150 tracks for hg19 and hg38 in the UCSC Genome Browser. 

For the SNP tracks in the Genome Browser, we download and import the data from files provided by dbSNP. Unfortunately, we don't really know why there is such a big difference between the number of SNPs annotated in hg19 versus hg38 in dbSNP 150. You can also contact dbSNP with questions about these differences: snp-...@ncbi.nlm.nih.gov

We do have dbSNP 151 available on our preview server for both hg38 and hg19: 

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining





--
Matthew Speir
Outreach, User Experience, Quality Assurance and User Support
HCA, CIRM, and UCSC Genome Browser
UCSC Genomics Institute
Reply all
Reply to author
Forward
0 new messages