Hello
I am trying to construct a PRS using the most recent release of the wave 2 schizophrenia GWAS summary stats. I am working through QC but am struggling with the step regarding filtering on minor allele frequency. This is because it turns out I am confused about how to define minor allele frequency, even though it seems like it should be simple! I hope it's OK to ask this here even though it's not strictly a PRSice question.
The wave 3 version of the wave 2 european data (reformatted)
provides:
FRQ_A_67390 Frequency
of the A1 allele in 67,390 cases. E.g. 0.990
FRQ_U_94015 Frequency
of the A1 allele in 94,015 controls e.g. 0.992
The A1 allele is the effect allele
I have also used ANNOVAR to annotate the allele
frequencies using the human genome project as a reference genome.
This gives values which are the inverse of those given in
the GWAS summary stats e.g. 0.01 for the allele above.
Both the ANNOVAR allele frequencies and GWAS allele frequencies range from <0.01 to > 0.99 (whereas a MAF should be maximum 0.5).
It seems like it just
depends how you define which allele is major and which minor. I think what matters
is which one is your effect allele? You don’t want to have very rare alleles
with large effect sizes having too much influence on your PRS. Although with
SNP where almost everyone has a variant (e.g. frequency of 99.5%) then it is
also a small number of people without it which will be influencing the output.
So my question is – which to exclude?
- SNP where the GWAS reports the effect allele as being
present in over 99% of cases and controls (but thus the minor allele has a
frequency of less than 1%)?
- SNP where the GWAS reports the effect allele as being
present in fewer than 1% of cases and controls? (but ANNOVAR will report
as frequency of 99%!)
Exclude SNP where GWAS effect allele frequency is
>0.99 OR < 0.01. I think this would exclude rare minor alleles from
both directions…
I'd be very grateful for any clarification anyone can offer,
Katie