logistic function with covariates in PLINK1.9

904 views
Skip to first unread message

Lin M.

unread,
Sep 12, 2014, 11:55:51 AM9/12/14
to plink2...@googlegroups.com
Hi Christopher,

We ran a logistic regression correcting for sex and age.  We did 2 different ways, the log and results are below. Can you explain why the --logistic sex flag is not working?

PLINK v1.90b2i 64-bit (8 Sep 2014)
16 arguments: --bfile Exome1_FHgenes --ci 0.95 --covar covarM.txt --covar-name age --logistic sex --out TEST_603_logsex_NEW --pheno EXOME1_ICD93_CALL-1.txt --pheno-name 603

Random number seed: 1410527648
32691 MB RAM detected; reserving 16345 MB for main workspace.
126 variants loaded from .bim file.
7672 people (3861 males, 3811 females) loaded from .fam.
7671 phenotype values present after --pheno.
Using 1 thread (no multithreaded calculations invoked).
--covar: 1 out of 2 covariates loaded.
Calculating allele frequencies... done.
Total genotyping rate is 0.999994.
126 variants and 7672 people pass filters and QC.
Among remaining phenotypes, 87 are cases and 7584 are controls.  (1 phenotype is missing.)
Writing logistic model association results to
TEST_603_logsex_NEW.assoc.logistic ... done.

 CHR              SNP         BP   A1       TEST    NMISS         OR       SE      L95      U95         STAT            P
 19    exm-rs2738459   11238473    C        ADD     7671         NA       NA       NA       NA           NA           NA
  19    exm-rs2738459   11238473    C        age     7671         NA       NA       NA       NA           NA           NA
  19    exm-rs2738459   11238473    C        SEX     7671         NA       NA       NA       NA           NA           NA
  19       exm2263593   11241983    T        ADD     7671         NA       NA       NA       NA           NA           NA

and

PLINK v1.90b2i 64-bit (8 Sep 2014)
16 arguments: --bfile Exome1_FHgenes --ci 0.95 --covar covarM.txt --covar-name age, sex --logistic --out TEST_603_covagesex_NEW --pheno EXOME1_ICD93_CALL-1.txt --pheno-name 603

Random number seed: 1410536560
32691 MB RAM detected; reserving 16345 MB for main workspace.
126 variants loaded from .bim file.
7672 people (3861 males, 3811 females) loaded from .fam.
7671 phenotype values present after --pheno.
Using 1 thread (no multithreaded calculations invoked).
--covar: 2 covariates loaded.
Calculating allele frequencies... done.
Total genotyping rate is 0.999994.
126 variants and 7672 people pass filters and QC.
Among remaining phenotypes, 87 are cases and 7584 are controls.  (1 phenotype is missing.)
Writing logistic model association results to TEST_603_covagesex_NEW.assoc.logistic ... done.

CHR              SNP         BP   A1       TEST    NMISS         OR       SE      L95      U95         STAT            P
  19    exm-rs2738459   11238473    C        ADD     7671      1.023   0.1548    0.755    1.385       0.1441       0.8854
  19    exm-rs2738459   11238473    C        age     7671      1.027 0.008788    1.009    1.045        3.032      0.00243
  19    exm-rs2738459   11238473    C        sex     7671  5.361e-06     25.1 2.299e-27 1.25e+16      -0.4835       0.6288

Adding sex as a covariate will be a problem for chr 22-26, or has that been changed with PLINK1.9?  Also, can you explain "1 phenotype is missing"?   Many thanks!

Lin

Christopher Chang

unread,
Sep 12, 2014, 12:34:12 PM9/12/14
to plink2...@googlegroups.com
"1 phenotype is missing" suggests that the --pheno file doesn't have an entry for one of your samples.

The "--logistic sex" regression failure might be due to a bug; could you send me a small fileset to replicate this with?

Lin M.

unread,
Sep 12, 2014, 1:47:27 PM9/12/14
to plink2...@googlegroups.com
Christopher,
Thank you for the quick reply.  Here are the files you'll need. 
My Best,
Lin
snps.bim
snps.fam
snps.bed
phenotest.txt
covarM.txt

Christopher Chang

unread,
Sep 12, 2014, 11:04:01 PM9/12/14
to plink2...@googlegroups.com
Hmm, this is actually a case of logistic regression nonconvergence; note the gigantic standard error and 95% confidence interval for sex.  "--logistic sex" is not really generating a different result than "--covar-name sex".  PLINK 1.07's logistic regression also exhibits nonconvergence:


 CHR         SNP         BP   A1       TEST    NMISS         OR       SE      L95      U95         STAT            P
   2   exm175686   21231278    A        ADD     7671      1.489    1.024   0.2002    11.08        0.389       0.6973
   2   exm175686   21231278    A        age     7671      1.027 0.008789     1.01    1.045        3.037     0.002386
   2   exm175686   21231278    A        sex     7671  3.222e-08    390.8        0      inf     -0.04414       0.9648
   2   exm175929   21238413    G        ADD     7671     0.4778   0.9966  0.06775    3.369      -0.7412       0.4586
   2   exm175929   21238413    G        age     7671      1.027 0.008786    1.009    1.045        3.025     0.002487
   2   exm175929   21238413    G        sex     7671  3.224e-08    390.7        0      inf     -0.04415       0.9648

Lin M.

unread,
Sep 13, 2014, 3:06:22 PM9/13/14
to plink2...@googlegroups.com
Chris,
You're correct. It seems PLINK1.0 reports p values for all analyses.  PLINK1.9 will only report those that converge, which may help with the increased speed.  But when I run the
--covar-name age,sex --logistic flags in PLINK1.9, I'd get similar results like PLINK1.0, except for the variable that doesn't converge well (sex).

PLINK v1.90b2i 64-bit (8 Sep 2014)
15 arguments: --bfile snps --ci 0.95 --covar covarM.txt --covar-name age,sex --logistic --out cov_ageSex --pheno phenotest.txt --pheno-name 603

CHR         SNP         BP   A1       TEST    NMISS         OR       SE      L95      U95         STAT            P
   2   exm175686   21231278    A        ADD     7671      1.489    1.024   0.2002    11.08        0.389       0.6973
   2   exm175686   21231278    A        age     7671      1.027 0.008788     1.01    1.045        3.038     0.002385
   2   exm175686   21231278    A        sex     7671  5.974e-06    23.37 7.657e-26 4.661e+14      -0.5147       0.6068

   2   exm175929   21238413    G        ADD     7671     0.4778   0.9966  0.06775    3.369      -0.7412       0.4586
   2   exm175929   21238413    G        age     7671      1.027 0.008786    1.009    1.045        3.025     0.002486
   2   exm175929   21238413    G        sex     7671  6.737e-06    21.56    3e-24 1.513e+13      -0.5523       0.5807


I understand about non-convergence in the data leading to insignificant p values for snp-phenotype associations. But covariate-phenotype associations are informative as well.  But I do appreciate the speed in PLINK1.9. Can you comment whether running the --covar-name age,sex --logistic flags is ok with all chromosomes except the X-chrom?  From your FAQ list, X-chrom associations use sex from ped/fam files. Should we run x-chrom in PLINK1.0 if we want all results to be in the output?

Thanks so much for your time and advice.

Christopher Chang

unread,
Sep 13, 2014, 3:19:03 PM9/13/14
to plink2...@googlegroups.com
Unfortunately, none of the coefficient estimates can be trusted if there is multicollinearity; you generally have to throw out a covariate in this situation.  (You can try running once with just age as a covariate, and once with just sex as a covariate, and proceed from there.)
Reply all
Reply to author
Forward
0 new messages