I've recently been trying to run PRSice using the imputed array genotypes from UKBiobank. I was having trouble with the job not completing within 7 days (the runtime limit on my university cluster). Having seen that newer versions of PRSice support multi-threaded clumping, I upgraded, but now have a new error.
PRSice 2.3.3 (2020-08-05)
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2021-03-10 10:23:05
/mnt/iusers01/bk01/v45331db/software/PRSice/PRSice_linux \
--a1 Coded \
--a2 Non_coded \
--allow-inter \
--bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
--base Shrine_FEV1_to_FVC_ratio_gwas_sum_stats_processed.txt \
--beta \
--binary-target T \
--bp Pos \
--chr Chromosome \
--clump-kb 250kb \
--clump-p 1.000000 \
--clump-r2 0.100000 \
--cov copd_prs_cov.tsv \
--extract copd_prs_imp_snps.valid \
--geno 0.02 \
--ignore-fid \
--info 0.8 \
--interval 5e-05 \
--keep eids_passing_array_qc.txt \
--lower 5e-08 \
--maf 0.01 \
--num-auto 22 \
--out copd_prs_imp_snps \
--pheno copd_prs_pheno.tsv \
--pvalue P \
--seed 2375121353 \
--snp SNP \
--stat beta \
--target /mnt/bk01-home01/shared/uk_biobank/GWAS/bgen_files/chr#,/mnt/bk01-home01/shared/uk_biobank/GWAS/project-specific_sample_files/ukb19056_imp_chr1_v3_s487297.sample \
--thread 4 \
--type bgen \
--upper 0.5
Initializing Genotype file:
/mnt/bk01-home01/shared/uk_biobank/GWAS/bgen_files/chr#
(bgen)
With external fam file:
/mnt/bk01-home01/shared/uk_biobank/GWAS/project-specific_sample_files/ukb19056_imp_chr1_v3_s487297.sample
Start processing
Shrine_FEV1_to_FVC_ratio_gwas_sum_stats_processed
==================================================
SNP extraction/exclusion list contains 5 columns, will
assume first column contains the SNP ID
Base file:
Shrine_FEV1_to_FVC_ratio_gwas_sum_stats_processed.txt
Header of file is:
SNP Chromosome Pos Coded Non_coded N Neff Coded_freq beta
SE P Info
19814168 variant(s) observed in base file, with:
2937205 variant(s) excluded based on user input
16876963 total variant(s) included from base file
Loading Genotype info from target
==================================================
487409 people (0 male(s), 0 female(s)) observed
400993 founder(s) included
76209143 variant(s) not found in previous data
9517 variant(s) with mismatch information
16876963 variant(s) included
Calculate MAF and perform filtering on target SNPs
==================================================
23512 variant(s) excluded based on genotype missingness
threshold
10511803 variant(s) excluded based on MAF threshold
12662 variant(s) excluded based on INFO score threshold
6328986 variant(s) included
Phenotype file: copd_prs_pheno.tsv
Column Name of Sample ID: eid
Note: If the phenotype file does not contain a header, the
column name will be displayed as the Sample ID which is
expected.
There are a total of 1 phenotype to process
Start performing clumping
I also tried running the same analysis using a different (related) quantitative target trait, and got essentially the same log/error.