Hi,
I'm running PLINK2 with simulated data. I want to include PC1 as a covariate to decrease the confounding effect brought by population structure. For some SNPs, I got NA for 'BETA', 'SE', 'T_STAT' and 'P'. The ERRCODE column shows 'CORR_TOO_HIGH'. I checked the correlation coefficient beteween PC1 and the genotype for those SNPs with Python. None of them show a correlation coefficient higher than 0.999 (the range is between -0.6~0.6). Do you have any thoughts on the potential reason for the error code? Thank you for the help!
Here's my command:
plink2 --pfile offspring_1child --pheno offspring_1child.pheno --pheno-name PHENO --glm --covar offspring_1child.covar --covar-name PC1 --out offspring_1child_gwas
Here's how the input data look like:
- Phenotype file
FID IID PHENO
FAM0 std1 -106.53436402346294
FAM1 std2 -70.19387845241816
FAM2 std3 -58.76111005636025
FAM3 std4 -113.04966863837323
FAM4 std5 -97.73902176007009
FAM5 std6 -79.56021192968942
- Covar file
FID IID PC1
FAM0 std1 3.797275709942582
FAM1 std2 4.2747284116346425
FAM2 std3 4.085876098266804
FAM3 std4 3.7928090917961015
FAM4 std5 4.208008149837333
FAM5 std6 3.8446688074863635
This is the log file:
PLINK v2.00a4.3 AVX2 (10 Jun 2023)
Options in effect:
--covar offspring_1child.covar
--covar-name PC1
--glm
--out offspring_1child_gwas
--pfile offspring_1child
--pheno offspring_1child.pheno
--pheno-name PHENO
Hostname:
endeavour1.hpc.usc.eduWorking directory: litmus_test
Start time: Sat Oct 18 14:46:59 2025
Random number seed: 1760824019
191861 MiB RAM detected, ~138162 available; reserving 95930 MiB for main
workspace.
Using up to 64 threads (change this with --threads).
2500 samples (0 females, 0 males, 2500 ambiguous; 0 founders) loaded from
offspring_1child.psam.
5000 variants loaded from offspring_1child.pvar.
1 quantitative phenotype loaded (2500 values).
1 covariate loaded from offspring_1child.covar.
--glm linear regression on phenotype 'PHENO': done.
Results written to offspring_1child_gwas.PHENO.glm.linear .
End time: Sat Oct 18 14:46:59 2025