plink clump option

162 views
Skip to first unread message

Henry Lu

unread,
Oct 8, 2022, 11:41:33 AM10/8/22
to plink2-users
Hello

I wonder if there are any random number generation procedures (seeds) in the --clump option where I "remove" the LD between genotypes. I am using PBS cloud computing. Every time I re-ran my code with the exact same clump option, I got a slightly different number of SNPs. (ie 125603 vs 125158)

--clump-p1 1
--clump-p2 1
--clump-r2 0.1
--clump-kb 500

Thanks again for your help.
Henry

Christopher Chang

unread,
Oct 8, 2022, 11:42:15 AM10/8/22
to plink2-users
Please post the full .log files from two runs with different results.

Henry Lu

unread,
Oct 8, 2022, 12:27:27 PM10/8/22
to plink2-users
It is about 500MB. Might be difficult to upload... I don't have the other yet because I just deleted them for re-running. If you need, I can re-run again. Most of the lines are like "Warning: '3:45825948:A:G' is missing from the main dataset, and is a top
variant". I guess because my bfile has much more SNPs than the summary statistics file.

Or are there any specific lines you want to see? These are the top lines. I do see a seed line? (Random number seed: 1665240748). Any ways to set that number fixed?

Hostname: node084
Working directory: XXX
Start time: Sat Oct  8 10:52:28 2022

Random number seed: 1665240748
122720 MB RAM detected; reserving 61360 MB for main workspace.
5353280 variants loaded from .bim file.
136 people (0 males, 0 females, 136 ambiguous) loaded from .fam.
Ambiguous sex IDs written to
xxx
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 136 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.997806.
5353280 variants and 136 people pass filters and QC.
Note: No phenotypes present.
Warning: '3:45825948:A:G' is missing from the main dataset, and is a top
variant.
Warning: '3:45954256:T:C' is missing from the main dataset, and is a top
variant.
Warning: '3:45928899:G:A' is missing from the main dataset, and is a top
variant.
Warning: '3:46010416:A:G' is missing from the main dataset, and is a top
variant.
Warning: '3:46031027:G:A' is missing from the main dataset, and is a top
variant.
Warning: '3:46075604:A:G' is missing from the main dataset, and is a top
variant.
Warning: '3:46024271:G:A' is missing from the main dataset, and is a top
variant.


Christopher Chang

unread,
Oct 8, 2022, 12:34:09 PM10/8/22
to plink2-users
You can omit the warnings, but why did you omit the version line from the top of the .log?!  I can't really help you without that.

Henry Lu

unread,
Oct 8, 2022, 3:07:24 PM10/8/22
to plink2-users
Sorry I didnt mean to omit it. I just missed it. Here they are: (I was using a conda environment) Thank you!!

(condaenv) [henry@qlogin13]$ head -30 xxx/clump.log
PLINK v1.90b6.21 64-bit (19 Oct 2020)
Options in effect:
  --bfile xxx/final
  --clump xxx/summary_stats.txt
  --clump-field all_inv_var_meta_p
  --clump-kb 500

  --clump-p1 1
  --clump-p2 1
  --clump-r2 0.1
  --clump-snp-field SNP
  --out xxx/clump

Hostname: node084
Working directory: /xxx

Start time: Sat Oct  8 10:52:28 2022

Random number seed: 1665240748
122720 MB RAM detected; reserving 61360 MB for main workspace.
5353280 variants loaded from .bim file.
136 people (0 males, 0 females, 136 ambiguous) loaded from .fam.
Ambiguous sex IDs written to
xxxl/clump.nosex .

Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 136 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.997806.
5353280 variants and 136 people pass filters and QC.
Note: No phenotypes present.
Warning: '3:45825948:A:G' is missing from the main dataset, and is a top
variant.

Christopher Chang

unread,
Oct 8, 2022, 3:14:45 PM10/8/22
to plink2-users
Try updating your plink 1.9 build.  From the version history:

"6 Jun 2021: --clump results should now be consistent across operating systems. (Previously, minor variations were possible when there were p-value ties.)"

You are using a build that predates that.

Henry Lu

unread,
Oct 8, 2022, 3:27:00 PM10/8/22
to plink2-users
Thank you for your prompt response. However, I try to use conda to make all of my scripts reproducible and the 6 Jun 2021 version is not available (if I am not mistaken).  Are there ways that I can use this build in a conda environment?
Do you mean this has nothing to do with random number generation? I could try to set the seed if there is an option...

Thanks again
Reply all
Reply to author
Forward
0 new messages