Batch Downloading of Sequence Surrounding a Coordinate/SNP of Interest

83 views
Skip to first unread message

Mutangadura, Tendai

unread,
Mar 30, 2016, 11:45:52 AM3/30/16
to Brooke Rhead, gen...@soe.ucsc.edu, Mutangadura, Tendai
UCSC Help Desk Team

Could you please explain to me or show me a quick way of downloading sequence surrounding a coordinate or SNP of interest (dog reference) for a list of SNPs without doing this one SNP/sequence at a time?

Thank you,
-Tendai

Matthew Speir

unread,
Apr 1, 2016, 12:50:27 PM4/1/16
to Mutangadura, Tendai, gen...@soe.ucsc.edu
Hi Tendai,

Thank you for your question about obtaining sequence surrounding SNPs in
the dog genome.

You can find more information about obtaining the genomic sequence for
multiple regions simultaneously Table Browser and a BED file in the
answer to this previous mailing list question:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/ztZP4LmptYs/Zl93NpwUPPwJ.

In addition to the methods described there, you can download the entire
dog genome sequence and then use command line utilities to extract the
sequence of a list of regions. First, you would need to download the dog
genomic sequence in the form of a 2bit file, which is available for the
assembly of your choice here:
http://hgdownload.soe.ucsc.edu/downloads.html#dog, under the "Full data
set" link. You can read more about the 2bit format here:
http://genome.ucsc.edu/FAQ/FAQformat.html#format7. Next, you will need
to download the command-line utility "twoBitToFa" for your system type
here from here: http://hgdownload.soe.ucsc.edu/admin/exe/. Then, you
will need to create a BED file,
http://genome.ucsc.edu/FAQ/FAQformat.html#format1, of the regions you're
interested in. Lastly, use the twoBitToFa utility, the dog 2bit file,
and your BED file to extract the sequences. The command to do so would
look something like:

twoBitToFa -bed=mySnps.bed canFam3.2bit mySnpsSequence.fa

I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages