plink2 --score variance-standardize for PCA projection

62 views
Skip to first unread message

Yilei

unread,
Jun 11, 2025, 11:28:41 PM6/11/25
to plink2-users
Hello,

I was using plink2's --score variance-standardize for PCA projection (https://www.cog-genomics.org/plink/2.0/score). It worked well if all my samples are lumped in one input file. However, there is a use case where I have to do this for one sample (or a small # of samples) at a time. In that case, --variance-standardize doesn't seem to work if a variant is monomorphic (if I understand the error message correctly?). But shouldn't the allele frequency be read from the file given by --read_freq ? I have checked my --read_freq file and the AF of this variant rs9967710 is not 0. I guess I am misunderstanding something here? Appreciate some clarifications! Thank you.  

Error: --score[-list] variance-standardize failure for variant 'rs9967710':
estimated allele frequency is zero or NaN, but not all dosages are zero. (This
is possible when e.g. allele frequencies are estimated from founders, but the
allele is only observed in nonfounders.)

Chris Chang

unread,
Jun 11, 2025, 11:30:00 PM6/11/25
to Yilei, plink2-users
Please post full .log file(s) when asking for troubleshooting help.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/034d7e60-78f3-4e60-b085-550b0260a071n%40googlegroups.com.

Yilei

unread,
Jun 11, 2025, 11:38:29 PM6/11/25
to plink2-users
sorry, here is the full log:

PLINK v2.0.0-a.6.16LM AVX2 Intel (9 Jun 2025)      cog-genomics.org/plink/2.0/

(C) 2005-2025 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to /app/tmp/samples.proj_2_1KG.log.

Options in effect:

  --out /app/tmp/samples.proj_2_1KG

  --pfile /app/tmp/samples

  --read-freq /references/1kg.2504.ref_pcs.rsID.acount

  --score /references/1kg.2504.ref_pcs.eigenvec.rsID.allele 26 5 header-read no-mean-imputation variance-standardize ignore-dup-ids cols=+scoresums,+denom

  --score-col-nums 6-25


Start time: Thu Jun 12 01:44:04 2025

64312 MiB RAM detected, ~32563 available; reserving 32156 MiB for main

workspace.

Allocated 7629 MiB successfully, after larger attempt(s) failed.

Using up to 32 threads (change this with --threads).

100 samples (0 females, 0 males, 100 ambiguous; 100 founders) loaded from

/app/tmp/samples.psam.

1293602 variants loaded from /app/tmp/samples.pvar.

Note: No phenotype data present.

--read-freq: PLINK 2 --freq file detected.

--read-freq: Frequencies for 139896 variants loaded.

Warning: 636 entries skipped due to missing variant IDs, mismatching allele

codes, and/or zero observations.

Calculating allele frequencies... done.

--score: 10k variants processed.

Error: --score[-list] variance-standardize failure for variant 'rs9967710':

estimated allele frequency is zero or NaN, but not all dosages are zero. (This

is possible when e.g. allele frequencies are estimated from founders, but the

allele is only observed in nonfounders.)

End time: Thu Jun 12 01:44:05 2025


Chris Chang

unread,
Jun 12, 2025, 1:24:46 AM6/12/25
to Yilei, plink2-users
What happens if you add “—extract /references/1kg.2504.ref_pcs.rsID.acount”?  You only have decent allele frequencies for those ~140k variants.

Yilei

unread,
Jun 12, 2025, 1:41:31 AM6/12/25
to plink2-users
If I add the --extract option, the same error occurs (and at the same variant). And yes the allele frequency file only has ~140k variants; the PCA was calculated using ~140k variants. 

I thought the variant's allele frequency was loaded from the --read-freq file, so even if the variant is monomorphic in my samples, it should still be fine? But apparently this is not the case. Could you clarify on that? Is there a misunderstanding from my side? Thank you!

Chris Chang

unread,
Jun 12, 2025, 1:49:17 AM6/12/25
to Yilei, plink2-users
Is it possible for you to upload a set of files (could contain just a few samples, a few variants, a tiny .acount file, etc.) and a .log file that lets me reproduce the error you're seeing?

Yilei

unread,
Jun 13, 2025, 11:43:15 AM6/13/25
to plink2-users
Hello Chris, thank you for your reply! Actually, while I was preparing a small set of files to reproduce the error, I realized what was causing this unexpected behavior. The alternative allele of this rs9967710 allele is different in my allele frequency file and my sample's vcf file, even thought somehow they were assigned the same rsID. So although I thought the allele frequency could be loaded from the .acount file, it actually cannot be loaded from there and plink2 has to estimate the freq from the samples themselves (if I understand correctly?). I have now switched to use chr:pos:ref:alt as the ID to avoid ambiguity. 

Chris Chang

unread,
Jun 13, 2025, 12:20:26 PM6/13/25
to Yilei, plink2-users
Ok, thanks for reporting back.  Yes, when the variant is missing from the —read-freq file, plink2 imputes from the current dataset; this is why I asked you to add —extract to the command line, to try to exclude such missing variants.

Reply all
Reply to author
Forward
0 new messages