plink2 Error: Unrecognized flag ('--clump-index-first').

34 views
Skip to first unread message

jie huang

unread,
Jun 4, 2025, 11:45:26 AM6/4/25
to plink2-users

Dear Chris:

I got 9 GWAS summary files. Let's say the main one is A.gz and the rest are B.gz to I.gz. 

Now I want to see if the lead SNPs (P <5E-08) in A.gz is marginally significant (P < 5E-07) in B.gz to I.gz  at all.

I thought that I could use the following options: --clump-p1 5e-08 --clump-p2 5e-07 --clump-index-first. However, as the title of this post indicated, it seems that --clump-index-first is not available any more. To walk around, I guess that I can create a genome-wide significant SNP list file such as A.gw-sig.snps, and then run plink2 --clump --extract A.gw-sig.snps? Do you recommend this approach.

The other puzzle I have is: the genetic data (such as 1000G or UKB) is usually split in chromosomes, but the GWAS file has SNPs from all chromosomes. Then, when I run plink2 --clump my.gwas.gz --pfile chr1.pfile, I will get thousands of SNPs written into the clumps.missing_id file simply because of those SNPs not in chr1. Is there a way to suppress it, or only output those SNPs truly missing? That is, those missing SNPs in chr1.  

Third, I specified --clump-bins 5e-08,5e-07, and I got the .clumps file shown as below. 
It is unbelievable NONSIG is 0. Apparently, there are certainly many SNPs that are non-significant. Does 734 below mean the number of SNPs with P <5E-07.   
1111.jpg

The previous screenshot was from a --clump comand where I only listed A.gz due to a mistake. Now I added --clump A.gz,B.gz,C.gz...J.gz, and below is the new output file. Does F=6 below mean the significant SNPs in this locus come from 6 files?
222.jpg

Thank you very much & best regards,
Jie

Chris Chang

unread,
Jun 4, 2025, 12:26:17 PM6/4/25
to jie huang, plink2-users
Always provide full .log file(s) when asking for troubleshooting help.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/4c090d8a-110e-4cb6-9447-bba2375dd2bbn%40googlegroups.com.

jie huang

unread,
Jun 5, 2025, 9:42:28 AM6/5/25
to plink2-users

Dear Chris:

Today I downloaded the latest version of PLINK2
Here is the full .log file https://github.com/jielab/001/blob/master/analysis/chr20.bad.log that shows --clump-index-first is not recognized.

Once I remove that option, the latest version somehow gave me a new error message Error: Alpha 5 --clump's handling of dosages when computing phased-r^2 is
incorrect, as shown here https://github.com/jielab/001/blob/master/analysis/chr20.log

However, yesterday the same command works fine, as shown here https://github.com/jielab/001/blob/master/analysis/chr20.OLD.log.

To summarize, I would like to use --clump for the following: 
  1. Hopefully I could use --clump-index-first, or you could recommend a walkaround.
  2. Hopefully I could use --clump-bins 5e-08,5e-07 to suppress NON-SIG SNPs. I only want those SNPs specified in --clump-bins to be in the .clumps files. Otherwise, the files are too big. 
  3. Hopefully the clumps.missing_id files only list truly missing SNPs within the chromosomes specified by --chr. Otherwise, these files will be too big too.

Thank you & best regards,
Jie 

Chris Chang

unread,
Jun 5, 2025, 7:50:40 PM6/5/25
to jie huang, plink2-users
- I don't know why you (i) downloaded the alpha 5 instead of the alpha 6 build, and then (ii) did not correct the mistake after seeing the alpha-5-specific error message.

- Given what you said you're trying to do, I don't see the point of including B.gz ... L.gz in the --clump operation.  --clump on just A.gz, and then write e.g. a short script to scan B.gz ... L.gz results for the significant index variants identified by --clump.  That lines up better with what you said you want than what you tried to do with --clump-index-first.

- You need to specify "--clump-p2 5e-07" if you want the SP2 column to exclude SNPs with p-value >= 5e-07; "--clump-bins 5e-08,5e-07" does not make that happen on its own.

- Re: "unbelievable NONSIG is 0", did you check the input files for a counterexample?

- --clump only looks at variant IDs, not positions, in its input files, so it wouldn't be possible for the .missing_id file to be affected by --chr in the manner you describe.  However, I will consider adding a flag which prevents creation of the .missing_id / .missing_allele files in a future build.

jie huang

unread,
Jun 6, 2025, 6:40:58 AM6/6/25
to plink2-users

Dear Chris,

I now downloaded the alpha 6 version. 
Previously, I simply picked the one at the end of the first line, when I see both versions have the timestamp of June 4th

And this is the first 10 lines of the output file This is the log file https://github.com/jielab/001/blob/master/analysis/chr20.clumps
I did specify both --clump-p2 5e-07 and --clump-bins 5e-08,5e-07. I assume those SNPs in the SP2 columns are the ones with P < 5e-07.

I think there is still a merit for --clump-index-first. If I simply run --clump A.gz and then match the top SNPs to B.gz and C.gz etc., I would have ignored the LD information between SNPs.

Best regards,
Jie

Chris Chang

unread,
Jun 6, 2025, 9:13:00 AM6/6/25
to jie huang, plink2-users
That doesn’t “ignore LD information between SNPs”.  The LD information is already used to remove significant-but-probably-noncausal SNPs from A.gz.

Reply all
Reply to author
Forward
0 new messages