COMMON field in the human dbsnp 144 vcf file

440 views
Skip to first unread message

Laura Smith

unread,
Oct 6, 2015, 2:08:51 PM10/6/15
to gen...@soe.ucsc.edu
Hi, 

I have a question for the COMMON=1 field given in the dbsnp version144 vcf file. Is this field the same as “UCSC COMMON SNPs”. Or is there any difference? 

Basically, can I assume that if a dbsnp variant has COMMON=1 in the vcf file, then this variant would also be a common ucsc snp? 

How is UCSC common SNPs derived? Does UCSC genome browser get these from dbsnp? 

Thank you,
Laura


For example: 

A dbsnp variant: 
1       909555  rs2340594       A       G,C     .       .       RS=2340594;RSPOS=909555;RV;dbSNPBuildID=100;SSR=0;SAO=0;VP=0x05010008000115011e000100;WGT=1;VC=SNV;SLO;INT;VLD;G5;GNO;KGPhase1;KGPilot123;KGPROD;OTHERKG;CAF=[0.08632,0.9137,.];COMMON=1





DSBNP WEBSITE:

COMMON
Term used to categorize variants in the human VCF files provided in the path ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF. The "common" category is restricted to alleles observed in the germline with a minor allele frequency (MAF) of >=0.01 in at least one major population, with at least two individuals from different families having the minor allele. "Common" may also include alleles with evidence of medical interest.
Note: the definition of "common" may be  based on only one of more than 50 major populations. These major populations may not include the population you are studying. 
Important: an allele shown to be "common" in one of the the 50 major populations used for this directory may not be common in all populations.



UCSC GENOME BROWSER WEBSITE:
23 October 2013 - dbSNP 138 Available for hg19
We are pleased to announce the release of four tracks derived from NCBI dbSNP Build 138 data, available on the human assembly (GRCh37/hg19). The new tracks contain additional annotation data not included in previous dbSNP tracks, with corresponding coloring and filtering options in the Genome Browser.
As was the case for the annotations based on the previous dbSNP build 137, there are four tracks in this release. One is a track containing all mappings of reference SNPs to the human assembly, labeled "All SNPs (138)". The other three tracks are subsets of this track and show interesting and easily defined subsets of dbSNP:
  • Common SNPs (138): uniquely mapped variants that appear in at least 1% of the population or are 100% non-reference

Matthew Speir

unread,
Oct 6, 2015, 3:44:34 PM10/6/15
to Laura Smith, gen...@soe.ucsc.edu
Hi Laura,

Thank you for your question about the UCSC "Common SNPs" track. The common field found in the dbSNP VCF is not equivalent to the UCSC Common SNPs track. One of our engineers shares the following information about the criteria we use for inclusion in the Common SNPs track:

"UCSC uses aggregate allele counts instead of looking within populations as dbSNP does. Here are our criteria for 'Common':
  • Uniquely mapped to the reference genome (i.e. one genomic location; mapping to one location on each of multiple alternate haplotype sequences is OK)
  • Combined allele counts (from dbSNP database tables SNPAlleleFreq and SNPAlleleFreq_TGP) sum to at least 10 (to avoid redundant submissions from personal genomes)
  • either reference allele count is 0 (i.e. very rare allele or sequencing error in reference), or the maximum allele frequency <= 0.99 (for bi-allelic SNPs, same as min allele freq >= 0.01)"
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Reply all
Reply to author
Forward
0 new messages