All analysis results for Firth regression are NA

409 views
Skip to first unread message

Wanting Zhao

unread,
Oct 12, 2017, 5:37:06 AM10/12/17
to plink2-users

I used PLINK2 to run the Firth logistic regression, but come out with some problems. I run the following script on chr19.

~/./plink2 --vcf SCES.1000Gphase3_v5.chr19.dose.vcf.gz --glm firth --1 --pheno plink-YL-phe-SCES-610K-model-2-highwopath.txt --pheno-name  path.or.hm --extract  snp-list-info-larger0.05.txt --out SCES-610K-model-2-path-chr19-info0.5

 

The output is like this. All the P-values are NA. The log file is attached.

#CHROM  POS     ID      REF     ALT     TEST    OBS_CT  OR      SE      T_STAT  P

19      238434  19:238434       C       G       ADD     130     NA      NA      NA      NA

19      246597  19:246597       T       C       ADD     130     NA      NA      NA      NA

19      247265  19:247265       G       T       ADD     130     NA      NA      NA      NA

19      247659  19:247659       G       A       ADD     130     NA      NA      NA      NA

19      248833  19:248833       T       G       ADD     130     NA      NA      NA      NA

19      252101  19:252101       C       T       ADD     130     NA      NA      NA      NA

19      252541  19:252541       G       A       ADD     130     NA      NA      NA      NA


The log file is as follows.
PLINK v2.00aLM 64-bit Intel (1 Oct 2017)
Options in effect:
  --1
  --extract ../snp-list-info-larger0.05.txt
  --glm firth
  --out SCES-610K-model-2-path-chr19-info0.5
  --pheno ../../phenotype/plink-YL-phe-SCES-610K-model-2-highwopath.txt
  --pheno-name path.or.hm
  --vcf ../../../../../genotype/SCES-610K/imputation-1KG-Phase3-V5-vcf/SCES.1000Gphase3_v5.chr19.dose.vcf.gz
Hostname: nus04
Working directory: /secure/seri_stat/zhaowt/PathMyopia/Firth/SCES-610K/Model2
Start time: Thu Oct 12 14:35:25 2017
Random number seed: 1507790125
128887 MB RAM detected; reserving 64443 MB for main workspace.
Using up to 24 threads (change this with --threads).
--vcf: 1084535 variants scanned.
--vcf: SCES-610K-model-2-path-chr19-info0.5-temporary.pgen +
SCES-610K-model-2-path-chr19-info0.5-temporary.pvar +
SCES-610K-model-2-path-chr19-info0.5-temporary.psam written.
1889 samples (0 females, 0 males, 1889 ambiguous; 1889 founders) loaded from
SCES-610K-model-2-path-chr19-info0.5-temporary.psam.
1084535 variants loaded from
SCES-610K-model-2-path-chr19-info0.5-temporary.pvar.
1 binary phenotype loaded (27 cases, 103 controls).
--extract: 204626 variants remaining.
Warning: At least 906 duplicate IDs in --extract file(s).
204626 variants remaining after main filters.
--glm Firth regression on phenotype 'path.or.hm': done.
Results written to SCES-610K-model-2-path-chr19-info0.5.path.or.hm.glm.firth .
End time: Thu Oct 12 14:40:31 2017



However, if I only run the script on the first 1000 SNPs of chr19, the output is reasonable.

#CHROM  POS     ID      REF     ALT     TEST    OBS_CT  OR      SE      T_STAT  P

19      238434  19:238434       C       G       ADD     130     1.86608 0.54052 1.15415 0.24844

19      246597  19:246597       T       C       ADD     130     1.54034 0.619368        0.697491        0.485496

19      247265  19:247265       G       T       ADD     130     1.78933 0.577133        1.00816 0.313377

19      247659  19:247659       G       A       ADD     130     NA      NA      NA      NA

19      248833  19:248833       T       G       ADD     130     NA      NA      NA      NA

19      252101  19:252101       C       T       ADD     130     1.86608 0.54052 1.15415 0.24844

19      252541  19:252541       G       A       ADD     130     1.20655 0.470288        0.399264        0.689699

 

Can PLINK2 only work well on few SNPs?

Christopher Chang

unread,
Oct 12, 2017, 10:34:23 AM10/12/17
to plink2-users
Hi,

Is there a test dataset you can send me which I can use to reproduce this bug?  (Also, does the bug remain if you add "--threads 1" to your command line?)

Wanting Zhao

unread,
Oct 13, 2017, 1:56:52 AM10/13/17
to plink2-users
Dear Christopher,

Thank you for your reply. I added "--threads 1" to the command,  but it did not work. The size of my genotype file is around 600M. How can send it to you?
SCES-OmniE-model-2-emmetrop-chr20-info0.5-age-gender.log

Christopher Chang

unread,
Oct 13, 2017, 11:56:13 AM10/13/17
to plink2-users
Okay, thanks for checking --threads.  As for transferring the test data, do you have (or can you set up) a Dropbox account?

Christopher Chang

unread,
Oct 13, 2017, 1:25:12 PM10/13/17
to plink2-users
Actually, this might not be necessary; Karl's --logistic bug report is likely to have identified the problem.  I will post a plink2 fix later today.

(The issue: there's an optimized matrix * vector routine used by both --logistic and plink 2.0's Firth regression which requires the common dimension to be divisible by 4.  Since the number of samples usually isn't divisible by 4, a bunch of zeroes were inserted at the end of the input data structures.  However, the zero-padding was only exhaustive for one of the parameters, since zero times anything is zero... except that zero times "nan" is NOT zero, instead it causes the regression to fail for no good reason.)

On Thursday, October 12, 2017 at 10:56:52 PM UTC-7, Wanting Zhao wrote:

Dmitry Prokopenko

unread,
Oct 16, 2017, 4:16:56 PM10/16/17
to plink2-users
Hi Chris,

I have noticed that in the current version (as compared to 13 Sept build) is not parsing correctly the column set descriptor in the --glm.
For example,
1) The call  plink2a --pfile chr22 --chr 22 --from-bp 16000000 --to-bp 17000000 --threads 4 --mac 1 --pheno pheno.txt --pheno-name caco --1 --glm firth hide-covar cols=chrom,pos,ref,alt,firth,test,altfreq,machr2,nobs,beta,se,ci,t,p --out test; 
results in
Error: Invalid --glm column set descriptor (either all column set IDs must be
preceded by +/-, or none of them can be).
2) The call  plink2a --pfile chr22 --chr 22 --from-bp 16000000 --to-bp 17000000 --threads 4 --mac 1 --pheno pheno.txt --pheno-name caco --1 --out test --glm firth hide-covar cols=chrom,pos,ref,alt,firth,test,altfreq,machr2,nobs,beta,se,ci,t,p;
results in
Error: Unrecognized ID
'MANPATH=/app/ge-6.2u5@share/ge6.2u5/man:/usr/share/man:/app/ge-6.2u5@share/ge6.2u5/man/'
in --glm column set descriptor.

Am I doing something wrong?

Thanks,
Dmitry

Chris Chang

unread,
Oct 16, 2017, 4:27:47 PM10/16/17
to Dmitry Prokopenko, plink2-users
That's definitely a bug caused by a recent refactoring of the command-line parsing code; investigating now.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christopher Chang

unread,
Oct 16, 2017, 4:38:58 PM10/16/17
to plink2-users
Bugfix is posted to Github, will post new prebuilt binaries in a few hours.  (All column-set parsing is broken in the October 13-15 builds, oops.)

test1

unread,
Oct 18, 2017, 1:37:38 PM10/18/17
to plink2-users
Chris: does this also fix the problem described in post dated on sep-12 by me- (glm-firth and maf )?
thanks
b.

Christopher Chang

unread,
Oct 18, 2017, 1:47:50 PM10/18/17
to plink2-users
I'm optimistic that it does; let me know what your results are.

test1

unread,
Oct 18, 2017, 2:36:58 PM10/18/17
to plink2-users
ok, thank you...I try hopefully later...

Wanting Zhao

unread,
Oct 24, 2017, 10:25:46 PM10/24/17
to plink2-users
Hi Chris,

I tried the latest version of plink2, but it only worked well for some chromosome (chr18-chr22). For other chromosome, I still got the results as follows
#CHROM  POS     ID      REF     ALT     TEST    OBS_CT  OR      SE      T_STAT  P
9       40997   9:40997 A       T       ADD     1805    NA      NA      NA      NA
9       41634   9:41634 G       T       ADD     1805    NA      NA      NA      NA
9       41644   9:41644 C       CT      ADD     1805    NA      NA      NA      NA
9       42917   9:42917 A       C       ADD     1805    NA      NA      NA      NA
9       43169   9:43169 A       G       ADD     1805    NA      NA      NA      NA
9       43847   9:43847 T       G       ADD     1805    NA      NA      NA      NA

What is the possible problem? Thanks!

Regards
Wanting

Christopher Chang

unread,
Oct 25, 2017, 1:02:38 AM10/25/17
to plink2-users
Hi,

Can you send me a sample dataset that this fails on, along with the .log file? Firth regression on the chr18 dataset you put in the current Dropbox link works fine for me.

Reply all
Reply to author
Forward
0 new messages