mm10 all-SNPS (in bed format) downloading problem

251 views
Skip to first unread message

Frances Vega

unread,
Apr 23, 2013, 12:06:24 PM4/23/13
to gen...@soe.ucsc.edu, Jens Lagergren, Marie Öhman, Gilad Silberberg
Dear Drs. Ann, Katrina, Pauline, Jqxie,

Good day!

May I ask for your expert assistance to solve a certain difficulty/problem I am currently having in obtaining the mm10.snp137 file (in bed format) through your http://genome.ucsc.edu/cgi-bin/hgTables table browser. The specifications I chose are:
clade: Mammal
genome: Mouse
assembly: Dec. 2011 (GRCm38/mm10) (comment: the most recent version, I believe.)
group: Variation and Repeats
track: All SNPs(137)
region: genome
output format: BED-browser extensible data
output file: mm10_snp137.bed.gz (gzip compressed)

In addition-
filter (edited options):
class does match single
molType doesn't match cDNA

Based on the discussion thread I read, it seems I have the same problem in downloading such files, albeit of a different genome with others who attempted to obtain massive files via your browser.

More specifically the problem as stated at the tail end of the bed file I tried to obtain via your table browser is: 'procedures have exceeded timeout: 1200 seconds, function has ended.'

I understand based on your response to previous queries and questions that the file I am trying to download is too large to download via the Table browser. You recommended that the files be downloaded from your download server (previous discussion were on hg19 human genome). I cannot seem to see a similar link for the mm10-allSNP files (mouse genome) (in bed format).

I would deeply appreciate it if you could provide me with the correct solution/recommendation and/or link to download the appropriate files without taxing the memory capabilities of your table browser.

Hoping for your kind consideration regarding the matter.

Respectfully yours,
Frances

Brian Lee

unread,
Apr 23, 2013, 2:31:37 PM4/23/13
to Frances Vega, gen...@soe.ucsc.edu, Jens Lagergren, Marie Öhman, Gilad Silberberg
Dear Frances,

Thank you for using the UCSC Genome Browser and your question about downloading the mouse mm10 assembly snp137 file.

You are right that the Table Browser will timeout with a long query, such as when trying to filter all the mouse SNPs to output in BED format.  You can accesses the entire snp137 file, snp137.txt.gz, from the mouse downloads directory: http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/ 

However, the downloaded file will not be in the BED format or filtered. You could perform a MySQL query, rather, to download just the fields chrom, chromStart, chromEnd, name, score, and strand along with your filter requirements if you install or already have MySQL installed.  Here is an example line that would download that query to a file from the command line: 

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -Ne "select chrom, chromStart, chromEnd, name, score, strand from snp137 where class='single' and molType!='cDNA'" mm10 > mm10SnpsBedFiltered 

Please note this download will take some time. See more information about MySQL access here: http://genome.ucsc.edu/goldenPath/help/mysql.html

Another direction you could take would be to get your desired results by performing operations from the tools available at the Galaxy website. Galaxy's tutorial video features getting data from the UCSC table browser: https://main.g2.bx.psu.edu/u/aun1/p/galaxy101 Please note that if you plan to use Galaxy tools, please direct your analysis questions to the Galaxy website: http://wiki.galaxyproject.org/Support

At the Galaxy website you could upload the unzipped snp137.txt.gz file, then use their tools like Filter to remove items, with an expression like c12=='single' and c11!='cDNA'.  Then you could user their Text Manipulation to cut just first few columns, c2 - c7, to form a BED file that would look like the following: 
chr12 56695053 56695054 rs231043706 0 +
chr12 56695099 56695100 rs261304642 0 +
chr12 56695134 56695135 rs219886964 0 +

Thank you again for your inquiry and using the UCSC Genome Browser. If you have further questions, please feel free to contact the mailing list again at gen...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group



--




Frances Vega

unread,
Apr 23, 2013, 4:40:41 PM4/23/13
to Brian Lee, gen...@soe.ucsc.edu, Jens Lagergren, Marie Öhman, Gilad Silberberg
Dear Dr. Brian Lee,

Thank you very much for your very thorough, prompt, kind, and helpful response.  I am very grateful for the time and effort you gave me and my colleagues.  All the best to you and the UCSC Genome Bioinformatics Group for the great work and service you are providing the international scientific community.

Best Regards,
Frances
--

Frances C. Vega
SciLifeLab
KISP


Postal address:
PO Box 1031
171 21 Solna
Sweden

Visiting address:
Tomtebodavägen 23 A

Delivery address:
Tomtebodavägen 23 B
171 65 Solna

"It always seems impossible until it's done."  - Nelson Mandela



Reply all
Reply to author
Forward
0 new messages