SNP visible in the genomebroswer, but missing from the download files

3 views
Skip to first unread message

Karel de Groot

unread,
Jul 2, 2018, 11:58:32 AM7/2/18
to gen...@soe.ucsc.edu

Hi,

 

I downloaded snp150.txt.gz from your ftp. When looking on the website using the genomebrowser (https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1%3A21508709-21508768&hgsid=679402131_nddK8azSQtvTMFQaap1LNLtaUQHp). I can see these snps  (rs970938238, rs959271763, rs994533351) that are not present in the downloaded file. What is the reason that these snps are missing in the file?

 

Thank you for your time.

 

Karel

Brian Lee

unread,
Jul 2, 2018, 7:33:19 PM7/2/18
to Karel de Groot, gen...@soe.ucsc.edu

Dear Karel,

Thank you for using the UCSC Genome Browser and your question about snp150.txt.gz and looking for the SNPs rs970938238, rs959271763, rs994533351.

I have confirmed these are in the file, perhaps you do not have the entire 7GB file? Here is the md5sum I found when checking the file I pulled: 2c5fdc04f0cf4aff29c0de9dee988d2f snp150.txt.gz

Here is also a tip on how to look for items in the file. You can create a second file with a list of your SNPs:

$ cat list.txt 
rs970938238
rs959271763
rs994533351

And then use the zcat and grep command to pull out the lines from the file (it may take some time since it is such a large file):

zcat snp150.txt.gz | grep -Fwf list.txt > mySnps.txt

In this case, the output for mySnps.txt will be these lines:

749    chr1    21508379    21508380    rs970938238    0    +    A    A    A/G    genomic    single    unknown    0    0    near-gene-5    exact    1        1    HUMAN_LONGEVITY,    0
749    chr1    21508744    21508745    rs959271763    0    +    C    C    C/T    genomic    single    unknown    0    0    near-gene-5    exact    1        1    HUMAN_LONGEVITY,    0
749    chr1    21508747    21508748    rs994533351    0    +    C    C    C/G    genomic    single    unknown    0    0    near-gene-5    exact    1        1    HUMAN_LONGEVITY,    0

Also, in case you do MySQL queries we do have a European MySQL server:

http://genome.ucsc.edu/goldenPath/help/mysql.html
mysql --user=genome --host=genome-euro-mysql.soe.ucsc.edu -A -P 3306

Also, based on your link, you may be interested to learn you can create custom track output of your BLAT searches. 
http://genome.ucsc.edu/goldenPath/newsarch.html#050417

If you create sessions these custom tracks will persist and not disappear if you wish to return to them. Here are some steps to creating sessions (sessions on our European server will not be identical to sessions on our US or Asian servers): 
https://genome.ucsc.edu/goldenPath/help/hgSessionHelp.html#Create

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UC Santa Cruz Genomics Institute



--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/42427CD789ECA84B8916740B1AAC716402DCC844B3%40NLMRCAMS04.nlmrcams.local.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages