UNFINISHED PLINK2 regression

32 views
Skip to first unread message

Zhixiu Li

unread,
Nov 14, 2024, 2:18:33 AMNov 14
to plink2-users
Dear All,

I've used two versions of PLINK2 to calculate association p-values and odds ratios, but I encountered a significant number of variants (~700k) with the "UNFINISHED" status. According to the PLINK2 manual, this status indicates that while logistic/Firth regression didn't fail outright, the results didn't meet the usual convergence criteria when the iteration limit was reached. Although results are still reported in these cases, they may be less accurate than usual.

Given this, would you recommend trusting the reported results, or would it be more appropriate to exclude the variants marked as "UNFINISHED"?

Below is one example from my cohort:

This is the result by v2.0.0-a.6LM AVX2 AMD (20 Oct 2024):
#CHROM POS ID REF ALT PROVISIONAL_REF? A1 OMITTED A1_FREQ FIRTH? TEST OBS_CT OR LOG(OR)_SE L95 U95 Z_STAT P ERRCODE
4 120893855 rs193103608 C T Y T C 0.000297089 Y ADD 1683 1.37067e-05 1.75936 4.35872e-07 0.000431029 -6.36459 1.95816e-10 UNFINISHED

by v2.00a3.6LM AVX2 Intel:
#CHROM POS ID REF ALT A1 FIRTH? TEST OBS_CT OR LOG(OR)_SE L95 U95 Z_STAT P ERRCODE
4 120893855 rs193103608 C T T Y ADD 1683 2.22041e-05 2.40708 1.98391e-07 0.0024851 -4.45155 8.52542e-06 UNFINISHED
The P values are 1.95816e-10 and 8.52542e-06, ORs are 1.37067e-05  and 2.22041e-05.

Below is my plink command:

plink2a.6LM --bfile ../${i} --geno 0.02 --mind 0.05 --covar PCA.txt  --covar-name PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10  --glm hide-covar --freq --ci 0.95 --out ${i}_assoc --make-bed --hwe 1e-6 --threads 10

Best Regards,
Zhixiu Li



Best Regards,
Zhixiu Li

Christopher Chang

unread,
Nov 14, 2024, 6:47:28 AMNov 14
to plink2-users
Please post full .log file(s) when asking for troubleshooting help.

Zhixiu Li

unread,
Nov 15, 2024, 11:55:03 AMNov 15
to Christopher Chang, plink2-users
Dear Christopher,

Below is one of the logs. The other is almost identical except the plink2 version.
(1) plink2a.6LM --bfile ../${i} --geno 0.02 --mind 0.05 --covar PCA.txt  --covar-name PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10  --glm hide-covar --freq --ci 0.95 --out ${i}_assoc --make-bed --hwe 1e-6 --threads 10
PLINK v2.0.0-a.6LM AVX2 AMD (20 Oct 2024)
Options in effect:
  --bfile ../4
  --ci 0.95
  --covar  PCA.txt
  --covar-name PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10
  --freq
  --geno 0.02
  --glm hide-covar
  --hwe 1e-6
  --make-bed
  --mind 0.05
  --out 4_assoc
  --threads 10


Hostname: cl5n140
Working directory: /home/1kg/merged/
Start time: Mon Oct 28 13:38:36 2024

Random number seed: 1730086716
1031395 MiB RAM detected, ~915387 available; reserving 515697 MiB for main
workspace.
Using up to 10 threads (change this with --threads).
1683 samples (0 females, 0 males, 1683 ambiguous; 1683 founders) loaded from
../4.fam.
3365016 variants loaded from ../4.bim.
1 binary phenotype loaded (1017 cases, 666 controls).
--update-name: 3346434 values updated.
Calculating sample missingness rates... done.
0 samples removed due to missing genotype data (--mind).
10 covariates loaded from PCA.txt.
1683 samples (0 females, 0 males, 1683 ambiguous; 1683 founders) remaining
after main filters.
1017 cases and 666 controls remaining after main filters.
Calculating allele frequencies... done.
--freq: Allele frequencies (founders only) written to 4_merged.afreq .
--geno: 1 variant removed due to missing genotype data.
--hwe: 639 variants removed due to Hardy-Weinberg exact test (founders only).
3364376 variants remaining after main filters.
Covariates written to  4_assoc.cov .
Writing 4_assoc.fam ... done.
Writing 4_assoc .bim ... done.
Writing 4_assoc.bed ... done.
--glm logistic-Firth hybrid regression on phenotype 'PHENO1': done.
Results written to 4_assoc.PHENO1.glm.logistic.hybrid .

End time: Mon Oct 28 13:41:28 2024

Thanks!

Best Regards,
Zhixiu Li


--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/15546bd6-e0b3-4ffc-8cbb-a4af42765d1cn%40googlegroups.com.

Christopher Chang

unread,
Nov 15, 2024, 1:07:59 PMNov 15
to plink2-users
Thanks.

From the --glm documentation: "Finally, the statistics computed by --glm are not calibrated well1 when the minor allele count is very small. '--mac 20' is a reasonable filter to apply before --glm; it's possible to make good use of --glm results for rarer variants (e.g. they could be input for a gene-based test), but some sophistication is required."

The A1_FREQ column in your example output indicates that there is only one copy of the minor allele for that variant.  You cannot conclude much in this situation.
Reply all
Reply to author
Forward
0 new messages