This is great, thank you! Will this information be included in the PLINK2
documentation?
The successful run we had included the log below. In the "Projecting random vectors" line, 21 steps are described, rather than the number 20 of requested principal components. I assume this is part of how the algorithm works, but just to make sure, is NPCs in this case equal to 20 or to 21? Also, in the same successful run, I monitored the CPU usage after providing the VM with 2 CPUs and PLINK2 with 2 threads, and this is the CPU usage summary I have got:
Most of the time PLINK2 is using 50% of the CPUs, that is 1 CPU in this example. So for the most part PLINK2 does not seem to be taking advantage of 2 threads. Is this a limitation of the algorithm? Would it make sense to provide even more threads for this type of computation?
PLINK v2.00a3.5 AVX2 (9 Aug 2022) www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to FinnGenR11Affymetrix.log.
Options in effect:
--exclude range exclusion_regions.txt
--memory 9946
--out FinnGenR11Affymetrix
--pca 20 approx
--pgen FinnGenR11Affymetrix.prune.bed
--psam FinnGenR11Affymetrix.prune.fam
--pvar FinnGenR11Affymetrix.prune.bim
--threads 2
Start time: Wed Feb 15 19:18:04 2023
11218 MiB RAM detected; reserving 9946 MiB for main workspace.
Using up to 2 compute threads.
345458 samples (196050 females, 149408 males; 345458 founders) loaded from
FinnGenR11Affymetrix.prune.fam.
287959 variants loaded from FinnGenR11Affymetrix.prune.bim.
Note: No phenotype data present.
--exclude bed1: 3016 variants excluded.
Calculating allele frequencies... done.
284943 variants remaining after main filters.
Excluding 11246 variants on non-autosomes from PCA approximation.
Projecting random vectors (1 compute thread)... 21/21.
Computing SVD of Krylov matrix... done.
Recovering top PCs from range approximation... done.
--pca approx: Eigenvectors written to FinnGenR11Affymetrix.eigenvec , and
eigenvalues written to FinnGenR11Affymetrix.eigenval .