memory requirements for Plink 2 PCA on a dataset with 152k subjects and ~300K variants

anna...@stanford.edu

unread,

Jul 10, 2017, 8:49:27 PM7/10/17

to plink2-users

Hello,

I am attempting to characterize population substructure for a large dataset (152k individuals) with PLINK 2.
I ran PCA analysis with 120 GB of memory, and obtained the following error message in the log:

PLINK v2.00aLM 64-bit Intel (5 Jun 2017) www.cog-genomics.org/plink/2.0/
(C) 2005-2017 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to pca_results.log.
Options in effect:
--bfile genome
--maf 0.10
--memory 120000
--out pca_results
--pca 5

Start time: Mon Jul 10 16:33:25 2017
128828 MB RAM detected; reserving 120000 MB for main workspace.
Using up to 16 threads (change this with --threads).
152727 samples (80986 females, 71741 males; 152727 founders) loaded from
/myrandomdirectory/plink/genome.fam.
847131 variants loaded from
/myrandomdirectory/plink/genome.bim.
Note: No phenotype data present.
152727 samples (80986 females, 71741 males; 152727 founders) remaining after
main filters.
Calculating allele frequencies... done.
518660 variants removed due to minor allele threshold(s)
(--maf/--max-maf/--mac/--max-mac).
328471 variants remaining after main filters.
Error: Out of memory. The --memory flag may be helpful.

What would be the expected amount of RAM needed to perform PCA on a dataset of this size? If this is not feasible with plink 2.0, what would be some tricks to obtain the population substructure w/ plink (i.e. I've already restricted maf to 0.10 or higher in hopes of filtering the dataset size). Anything else I should do?

Thank you for your help, much appreciated!
-Anna-

Christopher Chang

unread,

Jul 10, 2017, 9:53:26 PM7/10/17

to plink2-users

Hi Anna,

For datasets with tens or hundreds of thousands of samples, use "--pca approx" instead of --pca; this greatly reduces the memory requirement and runtime, and all reported PCs will generally still be accurate to within ~0.1%.

anna...@stanford.edu

unread,

Jul 10, 2017, 10:06:20 PM7/10/17

to plink2-users

Thank you so much, this solved the problem !

Reply all

Reply to author

Forward