GC values surround each SNP

55 views
Skip to first unread message

Emily Binversie

unread,
Jun 19, 2015, 3:37:44 PM6/19/15
to gen...@soe.ucsc.edu

I am looking for a file with the reported GC content of the 1Mb genomic region surrounding each SNP marker (500kb each side) for the Illumina Canine HD BeadChip for canFam2 ?


GC values range from 0 to 100 (indicating the percentage of G or C base pairs) in each region surrounding each SNP marker.

Matthew Speir

unread,
Jun 29, 2015, 6:41:20 PM6/29/15
to Emily Binversie, gen...@soe.ucsc.edu
Hi Emily,

Thank for your question about getting the GC percent content surrounding your SNPs. We don't provide any files that contain the data you are looking for as we don't host any Illumina SNP data for canFam2 here in the UCSC Genome Browser. You may want to contact Illumina to see if they have the data you are interested in. However, if you have a file that contains that positions of your SNPs plus 5kb on each side, you may be able to use the some of our command line tools or the UCSC Table Browser.

The first method that you can use to get the GC percent information is our command line tool hgWiggle. You can download hgWiggle for a variety of UNIX based systems here: http://hgdownload.soe.ucsc.edu/admin/exe/. Note that you will need to set up access to our MySQL server using a "hg.conf" file per the instructions under "Using the MySQL Server with our Utilities" here: http://genome.ucsc.edu/goldenPath/help/mysql.html. You can run hgWiggle on the command line without any arguments to see a usage message. In particular, you will need to use the "-bedFile" option to restrict the output to your SNP regions. To use the "-bedFile" option, you will need to make sure that your file is correctly in the BED format: http://genome.ucsc.edu/FAQ/FAQformat.html#format1.

The other method involves using the Table Browser. Similar to before, you will need a BED file of your SNP regions. To get get the GC percent information, use the following steps:
1. Navigate to the Table Browser, http://genome.ucsc.edu/cgi-bin/hgTables.
2. Click "add custom tracks".
3. Upload the BED formatted files of your SNP regions, or copy and paste the data from the file into the box.
4. Click "submit".
5. Click "go to table browser".
6. Make the following selections:
    group: Mammal
    genome: Dog
    assembly: May 2005 (Broad/canFam2)
    group: Mapping and Sequencing
    track: GC Percent
    table: gc5Base
    output: data points
    output file: enter a file name to save your results to a file, or leave blank to display results in your browser

7. Next to intersection, click "create".
8. Select the following:
        group: Custom Tracks
        track: My SNP Regions (substitute the name of your custom track here).
        table: default table
9. Check box next to "All GC Percent records that have any overlap with My SNP Regions"
10. Click "submit".
11. Click "get output".

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group



On 6/19/15 12:06 PM, Emily Binversie wrote:

I am looking for a file with the reported GC content of the 1Mb genomic region surrounding each SNP marker (500kb each side) for the Illumina Canine HD BeadChip for canFam2 ?


GC values range from 0 to 100 (indicating the percentage of G or C base pairs) in each region surrounding each SNP marker.

--


Reply all
Reply to author
Forward
0 new messages