Uh-oh, SNP chr22:15265708:C:T has low variance (=0)

94 views
Skip to first unread message

Yujie Wang

unread,
Mar 6, 2025, 11:24:37 AM3/6/25
to plink2-users
I used the following code to handle my data. However, it still reports low variance snp

Code:
plink2 --bgen /cromwell_root/fc-aou-datasets-controlled/v7/wgs/short_read/snpindel/acaf_threshold_v7.1/bgen/acaf_threshold.chr22.bgen ref-first --sample /cromwell_root/fc-aou-datasets-controlled/v7/wgs/short_read/snpindel/acaf_threshold_v7.1/bgen/acaf_threshold.chr22.sample \ --geno 0.9 \ --make-bed \ --out plink/missingness_filtered_data # skip remove duplicates variants and related samples # skip allele frequency table no one uses # filter on minor allele frequency # --maf 0.05 removes all variants with a minor allele frequency less than 0.05 plink2 --bfile plink/missingness_filtered_data \ --maf 0.05 \ --make-bed \ --out plink/maf_filtered_data #regenie still reports some low variance SNPs, so use AC plink2 --bfile plink/maf_filtered_data \ --mac 100 \ --make-bed \ --out plink/max_filtered_data # hardy weinberg filtering # --hwe 1e-25 removes all variants with a Hardy-Weinberg p-value greater than 1e-25 plink2 --bfile plink/max_filtered_data \ --hwe 1e-25 keep-fewhet \ --make-bed \ --out plink/hwe_filtered_data plink2 --bfile plink/hwe_filtered_data \ --set-missing-var-ids @:#\$1,\$2 \ --make-bed --out plink/hwe_filtered_data.newIDs \ --new-id-max-allele-len 1000 # linkage disequilibrium # --ld-window-r2 0.5 sets the window size to 0.5 plink2 --bfile plink/hwe_filtered_data.newIDs \ --indep-pairwise 200kb 1 0.5 \ --out plink/ldpruned_snplist \ --rm-dup force-first # prune the data plink2 --bfile plink/hwe_filtered_data.newIDs \ --extract plink/ldpruned_snplist.prune.in \ --export bgen-1.2 \ --out plink/acaf_threshold.chr22_ldpruned_data ) > "$out0216af8f" 2> "$err0216af8f"  
# LTL Added the next lines set -e TEMP_IN_LD=~{input_bgen} echo $TEMP_IN_LD mkdir -p plink # make regenie output directory mkdir -p regenie plink2 \ --bgen ~{input_bgen} ref-first \ --sample ~{input_samples} \ --mac ~{mac_threshold} \ --make-bed \ --out high_mac_variants regenie \ --step 1 \ --bed high_mac_variants \ --phenoFile ~{pheno_csv} \ --phenoCol ~{pheno_col} \ ~{"--covarFile " + covariate_csv} \ --bsize ~{step1_block_size} \ --lowmem \ --out regenie/~{output_prefix} echo "STEP 1 COMPLETE..." ls -la regenie/* regenie \ --step 2 \ --bgen ~{input_bgen} \ --sample ~{input_samples} \ --phenoFile ~{pheno_csv} \ --phenoCol ~{pheno_col} \ ~{"--covarFile " + covariate_csv} \ --bsize ~{step2_block_size} \ --approx \ --pThresh 0.01 \ --pred regenie/~{output_prefix}_pred.list \ --out regenie/~{output_prefix} ls -la regenie/* >>

Chris Chang

unread,
Mar 6, 2025, 12:21:59 PM3/6/25
to Yujie Wang, plink2-users
Please post full .log files when asking for troubleshooting help.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/0073b64c-862e-41f6-ad33-37e298bfc4e2n%40googlegroups.com.

Yujie Wang

unread,
Mar 6, 2025, 2:47:49 PM3/6/25
to plink2-users
Here is the full log file:

2025/02/28 21:22:38 Starting container setup. 2025/02/28 21:22:40 Done container setup. 2025/02/28 21:22:42 Starting localization. 2025/02/28 21:23:02 Localization script execution started... 2025/02/28 21:23:02 Localizing input gs://fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-regenie_steps/script -> /cromwell_root/script 2025/02/28 21:23:08 Localizing input gs://fc-secure-111/delta_cohort_SBP_max.tsv -> /cromwell_root/fc-secure-111/delta_cohort_SBP_max.tsv 2025/02/28 21:23:10 Localizing input gs://fc-secure-111/delta_cohort_pca_covariates.tsv -> /cromwell_root/fc-secure-111/delta_cohort_pca_covariates.tsv Copying gs://fc-secure-111/delta_cohort_pca_covariates.tsv to file:///cromwell_root/fc-secure-111/delta_cohort_pca_covariates.tsv 2025/02/28 21:23:11 Localizing input gs://fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-filter_variants_for_gwas/plink/acaf_threshold.chr22_ldpruned_data.sample -> /cromwell_root/fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-filter_variants_for_gwas/plink/acaf_threshold.chr22_ldpruned_data.sample 2025/02/28 21:23:13 Localizing input gs://fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-filter_variants_for_gwas/plink/acaf_threshold.chr22_ldpruned_data.bgen -> /cromwell_root/fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-filter_variants_for_gwas/plink/acaf_threshold.chr22_ldpruned_data.bgen Copying gs://fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-filter_variants_for_gwas/plink/acaf_threshold.chr22_ldpruned_data.bgen to file:///cromwell_root/fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-filter_variants_for_gwas/plink/acaf_threshold.chr22_ldpruned_data.bgen ....................................................................... Average throughput: 316.9MiB/s 2025/02/28 21:23:29 Localization script execution complete. 2025/02/28 21:23:46 Done localization. 2025/02/28 21:23:47 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash us.gcr.io/broad-dsp-gcr-public/terra-jupyter-aou@sha456:3f1bb849fe2cb3b7ae2039008f76dfd570344f934652b4f6e5ec1b5f8a789c6c /cromwell_root/script /cromwell_root/fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-filter_variants_for_gwas/plink/acaf_threshold.chr22_ldpruned_data.bgen PLINK v2.0.0-a.6.3LM 64-bit Intel (3 Dec 2024) cog-genomics.org/plink/2.0/ (C) 2005-2024 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to high_mac_variants.log. Options in effect: --bgen /cromwell_root/fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-filter_variants_for_gwas/plink/acaf_threshold.chr22_ldpruned_data.bgen ref-first --mac 100 --make-bed --out high_mac_variants --sample /cromwell_root/fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-filter_variants_for_gwas/plink/acaf_threshold.chr22_ldpruned_data.sample Start time: Fri Feb 28 21:23:50 2025 13982 MiB RAM detected, ~13230 available; reserving 6991 MiB for main workspace. Using up to 4 compute threads. --bgen: 23936 variants declared in header, format v1.2. 245394 samples imported from .sample file to high_mac_variants-temporary.psam . --bgen: high_mac_variants-temporary.pgen + high_mac_variants-temporary.pvar written. 245394 samples (0 females, 0 males, 245394 ambiguous; 245394 founders) loaded from high_mac_variants-temporary.psam. 23936 variants loaded from high_mac_variants-temporary.pvar. Note: No phenotype data present. Calculating allele frequencies... done. 0 variants removed due to allele frequency threshold(s) (--maf/--max-maf/--mac/--max-mac). 23936 variants remaining after main filters. Writing high_mac_variants.fam ... done. Writing high_mac_variants.bim ... done. Writing high_mac_variants.bed ... 3468done. End time: Fri Feb 28 21:24:37 2025 Start time: Fri Feb 28 21:24:50 2025 |=============================| | REGENIE v2.0.2.gz | |=============================| Copyright (c) 2020 Joelle Mbatchou and Jonathan Marchini. Distributed under the MIT License. Compiled with Boost Iostream library. Log of output saved in file : regenie/acaf_threshold.chr22.log Options in effect: --step 1 \ --bed high_mac_variants \ --phenoFile /cromwell_root/fc-secure-111/delta_cohort_SBP_max.tsv \ --phenoCol sbp_max \ --covarFile /cromwell_root/fc-secure-111/delta_cohort_pca_covariates.tsv \ --bsize 2000 \ --lowmem \ --out regenie/acaf_threshold.chr22 Fitting null model * bim : [high_mac_variants.bim] n_snps = 23936 * fam : [high_mac_variants.fam] n_samples = 245394 * bed : [high_mac_variants.bed] * phenotypes : [/cromwell_root/fc-secure-111/delta_cohort_SBP_max.tsv] n_pheno = 1 -dropping observations with missing values at any of the phenotypes -number of phenotyped individuals = 35726 -number of individuals remaining with non-missing phenotypes = 35726 * covariates : [/cromwell_root/fc-secure-111/delta_cohort_pca_covariates.tsv] n_cov = 21 -number of individuals with covariate data = 812 * number of individuals used in analysis = 812 -residualizing and scaling phenotypes...done (14ms) * # threads : [3] * block size : [2000] * # blocks : [12] * # CV folds : [5] * ridge data_l0 : [5 : 0.01 0.25 0.5 0.75 0.99 ] * ridge data_l1 : [5 : 0.01 0.25 0.5 0.75 0.99 ] * approximate memory usage : 4GB * writing level 0 predictions to disk -temporary files will have prefix [regenie/acaf_threshold.chr22_l0_Y] -approximate disk space needed : 113MB * setting memory...done Chromosome 22 block [1] : 2000 snps (51994ms) -residualizing and scaling genotypes...!! Uh-oh, SNP chr22:15265708:C:T has low variance (=0). 2025/02/28 21:26:18 Starting delocalization. 2025/02/28 21:26:19 Delocalization script execution started... 2025/02/28 21:26:19 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-regenie_steps/memory_retry_rc 2025/02/28 21:26:19 Delocalizing output /cromwell_root/rc -> gs://fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-regenie_steps/rc 2025/02/28 21:26:23 Delocalizing output /cromwell_root/stdout -> gs://fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-regenie_steps/stdout 2025/02/28 21:26:25 Delocalizing output /cromwell_root/stderr -> gs://fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-regenie_steps/stderr 2025/02/28 21:26:26 Delocalizing output /cromwell_root/regenie/acaf_threshold.chr22_firth_sbp_max.regenie -> gs://fc-secure-111/cromwell-execution/regenie_multiple_contigs/222/call-regenie_steps/shard-1/regenie_bgen/333/call-regenie_steps/regenie/acaf_threshold.chr22_firth_sbp_max.regenie Required file output '/cromwell_root/regenie/acaf_threshold.chr22_firth_sbp_max.regenie' does not exist.

Christopher Chang

unread,
Mar 6, 2025, 8:07:11 PM3/6/25
to plink2-users
That is a rather old version of regenie; first thing I'd try is updating it.

Christopher Chang

unread,
Mar 7, 2025, 5:04:50 AM3/7/25
to plink2-users
Actually, I see one clear problem and another potential problem.

Clear problem: when dosages are available, they are used for --mac.  That's a problem if you apply --mac first, and then truncate to hardcalls afterward.  --dosage-erase-threshold provides one way to truncate to hardcalls early enough for what you're doing.

Potential problem: if every single genotype is heterozygous, there's no variance.  Any reasonable --hwe filter will remove this.

Yujie Wang

unread,
Mar 7, 2025, 10:43:57 AM3/7/25
to plink2-users
I did not quite get it. Could you please help me to fix the clear problem? What should I do, add -dosage-erase-threshold option?

Thank you!



Reply all
Reply to author
Forward
0 new messages