Going from allele frequencies to MAF

175 views
Skip to first unread message

Katie Marwick

unread,
Nov 23, 2020, 4:13:15 AM11/23/20
to PRSice
Hello
I am trying to construct a PRS using the most recent release of the wave 2 schizophrenia GWAS summary stats. I am working through QC but am struggling with the step regarding filtering on minor allele frequency. This is because it turns out I am confused about how to define minor allele frequency, even though it seems like it should be simple! I hope it's OK to ask this here even though it's not strictly a PRSice question.

The wave 3 version of the wave 2 european data (reformatted) provides:

FRQ_A_67390 Frequency of the A1 allele in 67,390 cases.  E.g. 0.990

FRQ_U_94015 Frequency of the A1 allele in 94,015 controls  e.g. 0.992

The A1 allele is the effect allele


I have also used ANNOVAR to annotate the allele frequencies using the human genome project as a reference genome.

This gives values which are the inverse of those given in the GWAS summary stats e.g. 0.01 for the allele above.


Both the ANNOVAR allele frequencies and GWAS allele frequencies range from <0.01 to > 0.99 (whereas a MAF should be maximum 0.5). 


It seems like it just depends how you define which allele is major and which minor. I think what matters is which one is your effect allele? You don’t want to have very rare alleles with large effect sizes having too much influence on your PRS. Although with SNP where almost everyone has a variant (e.g. frequency of 99.5%) then it is also a small number of people without it which will be influencing the output.

So my question is – which to exclude? 

  1. SNP where the GWAS reports the effect allele as being present in over 99% of cases and controls (but thus the minor allele has a frequency of less than 1%)?
  2. SNP where the GWAS reports the effect allele as being present in fewer than 1% of cases and controls? (but ANNOVAR will report as frequency of 99%!)
  3. Exclude SNP where GWAS effect allele frequency is >0.99  OR < 0.01. I think this would exclude rare minor alleles from both directions…


I'd be very grateful for any clarification anyone can offer,
Katie

Sam Choi

unread,
Nov 23, 2020, 5:54:13 PM11/23/20
to PRSice
Usually, we just remove SNPs with allele frequency > 0.99  and < 0.01 because results from those SNPs might be due to change (genotyping error etc). Depending on your sample size, you can reduce the threshold and include SNPs with lower allele frequency. It doesn't really matter if the frequency is for the effect allele or not as it is just the flip side of the coin anyway.

Sam
Reply all
Reply to author
Forward
0 new messages