Extracting sequences around the required SNPs

27 views
Skip to first unread message

Manasa Lanka

unread,
Mar 3, 2015, 7:57:33 PM3/3/15
to gen...@soe.ucsc.edu
Hi,

I have a file that contains information about the start and end positions of many SNPs (missense mutations), along wit their chromosome number and Gene ID mentioned (TCGA mutation calling data). I want to look at the reference sequences (and subsequently synthesize peptides) around these mutations. Is there any way by which I can input my TCGA data file in the table browser, and obtain exactly 60 nucleotide sequences upstream and downstream the strand, while maintaining the reading frame of the gene?

Thanks,
Manasa

Jonathan Casper

unread,
Mar 6, 2015, 8:05:28 PM3/6/15
to Manasa Lanka, gen...@soe.ucsc.edu

Hello Manasa,

Thank you for your question about obtaining genomic sequence around the location of your SNPs. It is straightforward to obtain genomic sequence in a region 60 bp up and downstream of your SNPs, but tying that together with reading frame information may be difficult.

To obtain genomic sequence for your SNPs, you will need to load them into the UCSC Genome Browser as a custom track. If your data are in pgSNP or VCF format, you can load that file directly as a custom track. Otherwise, you may need to convert your SNP data into a simple coordinate format like BED (http://genome.ucsc.edu/FAQ/FAQformat.html#format1). After that, open the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables and select your custom track. Select the region "genome" and output format "sequence", then click "get output". On the next page, you can fill in the boxes to add 60 bases up and downstream from your SNPs.

Reading frame information is difficult because it is tied to particular gene definitions, and for some species we have many gene tracks. The Variant Annotation Integrator tool at http://genome.ucsc.edu/cgi-bin/hgVai will allow you to submit variants in VCF or pgSNP format (or just as a list of rsIDs from dbSNP), and generate consequences with respect to a gene set of your choice. For missense variants, that includes a short display of any codon changes. It will not, however, extend to 60 bases and there is currently no way to change that setting. Perhaps you can combine this output with the sequence from the Table Browser to get what you need?

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group


--


Reply all
Reply to author
Forward
0 new messages