dbsnp of MAF less than 1 percentage

103 views
Skip to first unread message

Rani James, Alva

unread,
Jan 27, 2015, 11:07:03 AM1/27/15
to gen...@soe.ucsc.edu
Dear All,


I am a phd student from Germany , I would like to have dbsnp with MAF less tahn 1 % , where can I find it.
I have snp135Common , but I dont understand which column is the MAF it has 13th column with by frequency written. which i dont understand so i cannot filter with that

I am looking for dbSNP of GrCh38 assembly and hg19 bulid. I would be really grateful if someone help me with this regard

Thank you




Kind regards,
Alva Rani James
L201
Phd student

German Cancer Consortium (DKTK)
German Cancer Research Center (DKFZ)
Foundation under Public Law
c/o Charite Campus Benjamin Franklin (CBF)
12203 Berlin
Germany
phone: +49 1575 4386404

a.j...@dkfz-heidelberg.de
www.dkfz.de/en/dktk


Management Board: Prof. Dr. Dr. h.c. Otmar D. Wiestler, Prof. Dr. Josef Puchta
VAT-ID No.: DE143293537

Jonathan Casper

unread,
Jan 30, 2015, 7:08:28 PM1/30/15
to Rani James, Alva, gen...@soe.ucsc.edu

Hello Alva,

Thank you for your question about finding items from dbSNP with minor allele frequency under 1%. You will need to look in another table for those SNPs, as the snp*Common (snp135Common, snp141Common, etc.) tables only include SNPS with minor allele frequency >= 1%. You can see a description of each of the SNP tracks clicking on the track's name from our main browser page at http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19. Please note that the most recent version of dbSNP data currently available on our site is SNP141, but we will be releasing SNP142 for display soon.

The "by frequency" that you sometimes find in the 13th column is a reference to how the SNP was validated - it does not correspond to the actual frequency of the allele. The data in many of the columns are explained on the track description page (e.g., http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=snp141). You can also select a table with the UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) and then click the "describe table schema" button for more information.

The frequency counts that you are looking for can be found in the 24th and 25th columns. The 24th column, labeled "alleleNs", is the number of reported counts of each allele. The 25th column is the computed frequencies based on those counts. Please note that some SNPs have very low reported counts, so the associated frequencies are likely to be inaccurate. For example, the following SNP is listed with frequencies of 50% for each allele:

| 585 | chr1  |      10256 |    10257 | rs111200574 |     0 | +      | A       | A       | A/C      | genomic | single   | unknown    |   0.5 |       0 | near-gene-5 | exact   |      1 |               |              1 | BUSHMAN,                    |               2 | A,C,    | 1.000000,1.000000, | 0.500000,0.500000, |  

Despite the reported frequency of 0.5000000 (which could imply a high level of accuracy), the reported counts are only one instance of each allele. That is not a lot of information. It is up to you how to handle data like this. Please also note that some alleles have no frequency data reported at all. The 24th and 25th columns will be empty if there are no frequency data available.

As I noted above, the snp135Common table will only contain SNPs with a minor allele frequency >= 1%. There are other tables that contain all SNPs, including ones with a frequency < 1%. These tables do not have "Common" in their names. For example, you may be interested in using the "snp135" table (or, for more recent data, "snp141"). You can then apply your own filter to the table to find the SNPS you are looking for. We recommend that you download the collection of SNP data either directly from dbSNP or from the table dumps on our download server at http://hgdownload.soe.ucsc.edu. For example, you can download the snp141 table from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp141.txt.gz (note that this compressed file is 1.7GB). Then filter that table yourself based on what you find in the frequency data. You will probably need a program to do the filtering for you, as there are more than 60 million SNPs described in that table. If you do not have your own program for this, you may find the online tools at Galaxy (https://usegalaxy.org) to be useful.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.

--
Jonathan Casper
UCSC Genome Bioinformatics Group



--


Rani James, Alva

unread,
Feb 6, 2015, 10:53:30 AM2/6/15
to gen...@soe.ucsc.edu
Hello Jonathan,


Thank you so much for reply , and now I started to look into this .I have a question what do you mean by ,

Despite the reported frequency of 0.5000000 (which could imply a high level of accuracy), the reported counts are only one instance of each allele. That is not a lot of information. It is up to you how to handle data like this.

And as I understand from the links its Common SNPs are those with atleast MAF of 1 %, I just wanted to know is the snp141Common has the same genome annotation ad hg19 bulid because in the whole pipeline I have used hg19 as my reference


Kind regards,
Alva Rani James
L201
Phd student

German Cancer Consortium (DKTK)
German Cancer Research Center (DKFZ)
Foundation under Public Law
c/o Charite Campus Benjamin Franklin (CBF)
12203 Berlin
Germany
phone: +49 1575 4386404

a.j...@dkfz-heidelberg.de
www.dkfz.de/en/dktk


Management Board: Prof. Dr. Dr. h.c. Otmar D. Wiestler, Prof. Dr. Josef Puchta
VAT-ID No.: DE143293537
________________________________________
From: Jonathan Casper [jca...@soe.ucsc.edu]
Sent: Saturday, January 31, 2015 1:08 AM
To: Rani James, Alva
Cc: gen...@soe.ucsc.edu
Subject: Re: [genome] dbsnp of MAF less than 1 percentage

Hello Alva,

Thank you for your question about finding items from dbSNP with minor allele frequency under 1%. You will need to look in another table for those SNPs, as the snp*Common (snp135Common, snp141Common, etc.) tables only include SNPS with minor allele frequency >= 1%. You can see a description of each of the SNP tracks clicking on the track's name from our main browser page at http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19. Please note that the most recent version of dbSNP data currently available on our site is SNP141, but we will be releasing SNP142 for display soon.

The "by frequency" that you sometimes find in the 13th column is a reference to how the SNP was validated - it does not correspond to the actual frequency of the allele. The data in many of the columns are explained on the track description page (e.g., http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=snp141). You can also select a table with the UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) and then click the "describe table schema" button for more information.

The frequency counts that you are looking for can be found in the 24th and 25th columns. The 24th column, labeled "alleleNs", is the number of reported counts of each allele. The 25th column is the computed frequencies based on those counts. Please note that some SNPs have very low reported counts, so the associated frequencies are likely to be inaccurate. For example, the following SNP is listed with frequencies of 50% for each allele:

| 585 | chr1 | 10256 | 10257 | rs111200574 | 0 | + | A | A | A/C | genomic | single | unknown | 0.5 | 0 | near-gene-5 | exact | 1 | | 1 | BUSHMAN, | 2 | A,C, | 1.000000,1.000000, | 0.500000,0.500000, |


Despite the reported frequency of 0.5000000 (which could imply a high level of accuracy), the reported counts are only one instance of each allele. That is not a lot of information. It is up to you how to handle data like this. Please also note that some alleles have no frequency data reported at all. The 24th and 25th columns will be empty if there are no frequency data available.

As I noted above, the snp135Common table will only contain SNPs with a minor allele frequency >= 1%. There are other tables that contain all SNPs, including ones with a frequency < 1%. These tables do not have "Common" in their names. For example, you may be interested in using the "snp135" table (or, for more recent data, "snp141"). You can then apply your own filter to the table to find the SNPS you are looking for. We recommend that you download the collection of SNP data either directly from dbSNP or from the table dumps on our download server at http://hgdownload.soe.ucsc.edu<http://hgdownload.soe.ucsc.edu/>. For example, you can download the snp141 table from http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp141.txt.gz (note that this compressed file is 1.7GB). Then filter that table yourself based on what you find in the frequency data. You will probably need a program to do the filtering for you, as there are more than 60 million SNPs described in that table. If you do not have your own program for this, you may find the online tools at Galaxy (https://usegalaxy.org<https://usegalaxy.org/>) to be useful.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu<mailto:gen...@soe.ucsc.edu> or genome...@soe.ucsc.edu<mailto:genome...@soe.ucsc.edu>. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu<mailto:genom...@soe.ucsc.edu>.

--
Jonathan Casper
UCSC Genome Bioinformatics Group

On Tue, Jan 27, 2015 at 5:17 AM, Rani James, Alva <a.j...@dkfz-heidelberg.de<mailto:a.j...@dkfz-heidelberg.de>> wrote:
Dear All,


I am a phd student from Germany , I would like to have dbsnp with MAF less tahn 1 % , where can I find it.
I have snp135Common , but I dont understand which column is the MAF it has 13th column with by frequency written. which i dont understand so i cannot filter with that

I am looking for dbSNP of GrCh38 assembly and hg19 bulid. I would be really grateful if someone help me with this regard

Thank you




Kind regards,
Alva Rani James
L201
Phd student

German Cancer Consortium (DKTK)
German Cancer Research Center (DKFZ)
Foundation under Public Law
c/o Charite Campus Benjamin Franklin (CBF)
12203 Berlin
Germany
phone: +49 1575 4386404<tel:%2B49%201575%204386404>

a.j...@dkfz-heidelberg.de<mailto:a.j...@dkfz-heidelberg.de>
www.dkfz.de/en/dktk<http://www.dkfz.de/en/dktk>

Matthew Speir

unread,
Feb 10, 2015, 5:42:50 PM2/10/15
to Rani James, Alva, gen...@soe.ucsc.edu
Hi Alva,

I'm assuming that you are downloading your "snp141Common" data from the Table Browser, http://genome.ucsc.edu/cgi-bin/hgTables?db=hg19, or from our download server, http://hgdownload.soe.ucsc.edu/downloads.html. To get SNP data for hg19 from the Table Browser, just ensure that have selected hg19 from the "assembly" drop-down menu on the Table Browser. If you are downloading SNP data from our downloads server, you should see "hg19" in the URL like so http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/snp141Common.txt.gz.

Please also note that some alleles, such as rs376643643, have no frequency data reported at all. The alleleNs and alleleFreqs, or 24th and 25th columns, will be empty if there are no frequency data available. For example, you can see the alleleNs and alleleFreqs columns from the rs376643643 entry in snp141 are empty:

+-------------+----------+-------------+
| name        | alleleNs | alleleFreqs |
+-------------+----------+-------------+
| rs376643643 |          |             |
+-------------+----------+-------------+


I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages