If clusters are defined (via --within/--family), you can base the principal components off a subset of samples and then project everyone else onto those PCs with --pca-cluster-names and/or --pca-clusters.
Hi! I've been trying to use the steps described here to do a PCA projection analysis, but for some reason the process gets stuck when reading the .afreq file. This is what appears on the screen:
Options in effect:
--bfile datosbhr10
--extract bhr10armenians
--out proy10arm-sobre13
--read-freq b13pca.afreq
--score b13pca.eigenvec.var 2 3 header-read no-mean-imputation variance-normalize
--score-col-nums 5-14
Start time: Mon Aug 26 20:02:11 2019
Note: --score's 'variance-normalize' modifier has been renamed to the more
precise 'variance-standardize'.
7841 MiB RAM detected; reserving 3920 MiB for main workspace.
Using up to 4 compute threads.
466 samples (85 females, 378 males, 3 ambiguous; 466 founders) loaded from
datosbhr10.fam.
206495 variants loaded from datosbhr10.bim.
Note: No phenotype data present.
--extract: 0 variants remaining.
--read-freq: PLINK 2 --freq file detected.
And it stays like that for hours until I stop it. It doesn’t give me any error message or anything.
Any idea why or how to solve it?
Thanks
Error: --score variance-standardize failure for variant '37:1:2106896:T:C':
estimated allele frequency is zero or NaN, but not all dosages are zero. (This
is possible when e.g. allele frequencies are estimated from founders, but the
allele is only observed in nonfounders.)
Any tips would be appreciated as to why I am getting this error. Thank you!