QQplot show higher inflation when including pcs as covariate in the association analysis

505 views
Skip to first unread message

Angela Parody

unread,
Nov 13, 2017, 7:27:38 PM11/13/17
to plink2-users
Hi,

Briefly, I am working with 3412 SNPs (which are placed in candidate genes exons), 265 individuals, and one quantitative phenotype. I performed an association analysis using plink and no covariates. It gave me a lambda value = 1.4 (see below, highlighted in yellow):

 PLINK v1.90b3 64-bit (11 Jan 2015)
7 arguments: --adjust --bfile myplinkPKsexphenoFID --linear --out PKperm --perm
Hostname: MacBook-Pro-de-Angela.local
Working directory: /Users/angelaparodymerino/Documents/genphen/PLINK_sd2/Allsites
Start time: Tue Nov 14 00:46:42 2017

Note: --perm flag deprecated.  Use e.g. '--model perm'.
Random number seed: 1510616802
8192 MB RAM detected; reserving 4096 MB for main workspace.
3412 variants loaded from .bim file.
265 people (129 males, 136 females) loaded from .fam.
265 phenotype values loaded from .fam.
Using up to 8 threads (change this with --threads).
Before main variant filters, 265 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.99727.
3412 variants and 265 people pass filters and QC.
Phenotype data is quantitative.
Writing linear model association results to PKperm.assoc.linear ... done.
--adjust: Genomic inflation est. lambda (based on median chisq) = 1.41828.
--adjust values (3412 variants) written to PKperm.assoc.linear.adjusted .
739948 (adaptive) permutations complete.
Permutation test report written to PKperm.assoc.linear.perm .

End time: Tue Nov 14 00:46:50 2017

The resultant QQplot looks like this:


If I am right, well, it looks that it needs improvement, meaning that there is some inflation maybe due to population stratification, which I can correct by including PCs as covariate (right?). 

So, I run an association test including 2 PCs as covariates (note that I did a preliminary work on the genetic background of the population and it gave me a hint for possible K=2 populations: meaning, just very subtle population stratification). My cov_2pcs.txt looks like this:

FID IID PC1 PC2

FAM1 NWZ_106801_P03_WH01 -0.025861118 -0.07565873

FAM2 NWZ_106801_P01_WD11 -1.6138029 -0.08481272

FAM3 NWZ_106801_P03_WG01 -0.63852423 -0.121126294

FAM4 NWZ_106801_P02_WG03 -0.8286301 0.22903343


PCs were done in TASSEL. Anyway, I run the analysis:


PLINK v1.90b3 64-bit (11 Jan 2015)

9 arguments: --adjust --bfile myplinkPKsexphenoFID --covar cov_2pcs.txt --linear --out PKpcasperm --perm

Hostname: MacBook-Pro-de-Angela.local

Working directory: /Users/angelaparodymerino/Documents/genphen/PLINK_sd2/Allsites

Start time: Tue Nov 14 00:56:31 2017


Note: --perm flag deprecated.  Use e.g. '--model perm'.

Random number seed: 1510617391

8192 MB RAM detected; reserving 4096 MB for main workspace.

3412 variants loaded from .bim file.

265 people (129 males, 136 females) loaded from .fam.

265 phenotype values loaded from .fam.

Using up to 8 threads (change this with --threads).

--covar: 2 covariates loaded.

Before main variant filters, 265 founders and 0 nonfounders present.

Calculating allele frequencies... done.

Total genotyping rate is 0.99727.

3412 variants and 265 people pass filters and QC.

Phenotype data is quantitative.

Writing linear model association results to PKpcasperm.assoc.linear ... done.

--adjust: Genomic inflation est. lambda (based on median chisq) = 1.11648.

--adjust values (3412 variants) written to PKpcasperm.assoc.linear.adjusted .

265303 (adaptive) permutations complete.

Permutation test report written to PKpcasperm.assoc.linear.perm .


End time: Tue Nov 14 00:56:34 2017

 


 and obtained a quite much more inflated QQplot:



It might be because, according to what I have read (or I might be wrong?): covariate files only admit two values (1 or 2), and (0, -9 for missing values). Is it rounding all values of the PC1 and PC2 to 0, 1 or 2? (no idea).


Does anyone see what I might be doing wrong? 


Or, should I use another way to correct the inflation due to population stratification?


Thanks in advance,


Ángela Parody-Merino


Angela Parody

unread,
Nov 14, 2017, 6:04:37 PM11/14/17
to plink2-users
Ok, I found the answer here: https://stats.stackexchange.com/questions/181642/including-covariates-makes-qq-plot-worse

The resultant *assoc.linear file contains P values that shouldn't be used for the QQplot as explained at the end of this website: http://zzz.bwh.harvard.edu/plink/faq.shtml

By eliminating those P-values, the QQ plot looks better than without correcting for Pcs.



Problem solved (I think), although still the QQ plot doesn't look perfect... Does anyone want to comment what could I do to improve it? For example, how could I correct by relatedness (kinship) when running the association test in PLINK?

Thanks,

Cheers,

'Angela Parody-Merino

Christopher Chang

unread,
Nov 14, 2017, 6:21:57 PM11/14/17
to plink2-users
To correct for relatedness with just plink, you prune one member of each pair of closely related samples ("plink --rel-cutoff [relatedness coefficient]" in v1.9, "plink2 --king-cutoff [kinship coefficient]" in v2.0) before performing association analysis.  I normally use a kinship coefficient of ~0.177, corresponding to first-degree relations, when using --king-cutoff.

If you want to try to use all of your samples, the standard approach today is to use a linear mixed model.  plink does not currently have this built-in, but there are several programs which can do this with plink-formatted data; BOLT-LMM, FaST-LMM, and GCTA are some options.

Vince Forgetta

unread,
Nov 15, 2017, 2:17:43 PM11/15/17
to plink2-users
FYI, I've never tested BOLT-LMM on small sample size, but author recommends using GCTA instead:

https://data.broadinstitute.org/alkesgroup/BOLT-LMM/#x1-20001

Vince
Reply all
Reply to author
Forward
0 new messages