PLINK2.0 using 1000 genomes phase 3 data

517 views
Skip to first unread message

Elayna Kirsch

unread,
Aug 28, 2018, 4:59:21 PM8/28/18
to plink2-users
I have downloaded the Merged dataset uploaded under the phase 3 PLINK 2.0 resources (pgen.zst and pvar.zst files) and decrompressed the pgen file using PLINK 2.0. I now want to do further quality control, such as applying the functions: --geno --mind and --maf. When I try to run: ./plink2 --pfile all_phase_3 --geno 0.2 --make-pfile --out all_phase_3_2  for example, the terminal spits back: 


PLINK v2.00a2 AVX2 (26 Aug 2018)               www.cog-genomics.org/plink/2.0/

(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to all_phase_3_2.log.

Options in effect:

  --geno 0.2

  --make-pgen

  --out all_phase_3_2

  --pfile all_phase_3


Start time: Tue Aug 28 16:55:56 2018

8192 MiB RAM detected; reserving 4096 MiB for main workspace.

Using up to 4 compute threads.

Error: Failed to open all_phase_3.pgen.

End time: Tue Aug 28 16:55:56 2018

Laynies-MBP:phase3 layniekirsch$ 


I am not sure why it can't read the all_phase3.pgen file (which my computer says is type document). Also, which pam file is correct to download from the resource website? Right now, I downloaded the common sample information file (pedigree corrected), which downloads as a .txt file. I appreciate the help!


Thanks,

Elayna

Christopher Chang

unread,
Aug 28, 2018, 6:19:41 PM8/28/18
to plink2-users
1. Do you have an extra underscore in your filenames?  (all_phase_3 vs. all_phase3)
2. How large is the uncompressed .pgen file?  (should be ~6.7 GB)

Elayna Kirsch

unread,
Aug 29, 2018, 9:18:58 AM8/29/18
to plink2-users
I did have an extra underscore, thank you for pointing that out. The decompressed .pgen file is 6.7 GB on my computer, so that appears to have worked fine. Now when I run the code, however, PLINK cannot open the all_phase3.psam file (common sample information file/pedigree corrected data). 

Christopher Chang

unread,
Aug 29, 2018, 11:17:59 AM8/29/18
to plink2-users
1. You need to rename the .psam file you downloaded to "all_phase3.psam".
2. If you are leaving the .pvar file in compressed form, you'll also need to use "--pfile all_phase3 vzs" rather than just "--pfile all_phase3" to tell plink2 to look for a zst-compressed .pvar file instead of a plain-text one.

Elayna Kirsch

unread,
Aug 29, 2018, 11:26:01 AM8/29/18
to plink2-users
I changed the name to be all_phase3.psam and it is still unable to open the file. This is my output:


Laynies-MBP:phase3 layniekirsch$ ./plink2 --pfile vzs all_phase3 --maf 0.01 --make-pgen --out all_phase3_2

PLINK v2.00a2 AVX2 (26 Aug 2018)               www.cog-genomics.org/plink/2.0/

(C) 2005-2018 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to all_phase3_2.log.

Options in effect:

  --maf 0.01

  --make-pgen

  --out all_phase3_2

  --pfile vzs all_phase3


Start time: Wed Aug 29 11:23:24 2018

8192 MiB RAM detected; reserving 4096 MiB for main workspace.

Using up to 4 compute threads.


Error: Failed to open all_phase3.psam.

End time: Wed Aug 29 11:23:24 2018

Laynies-MBP:phase3 layniekirsch$ 


I am not sure what is wrong with the .psam file. I downloaded it from the website and just changed the name. 
Thanks,
Elayna 

Christopher Chang

unread,
Aug 29, 2018, 11:29:07 AM8/29/18
to plink2-users
Is it in the correct directory, and is every letter in the name correct?  "Failed to open" implies the file isn't there at all.

Elayna Kirsch

unread,
Aug 29, 2018, 12:19:29 PM8/29/18
to plink2-users
The files were in the correct folder but I noticed by calling ls in the command folder the computer still added a .txt to the end of the .psam file name, even though the file was renamed. After removing that again it seems to be working now! However, the terminal stops because it says, error: out of memory, and suggests using --memory flag. Is this due to little space on my computer? Is there a recommended main workspace size to use as the memory parameter? 
Screen Shot 2018-08-29 at 12.10.28 PM.png

Christopher Chang

unread,
Aug 29, 2018, 12:50:06 PM8/29/18
to plink2-users
plink2 needs about 8 GiB (--memory 8192) of workspace for this operation.  So if you don't have access to a machine with at least 12 GiB of RAM, you are probably better off working with the split-by-chromosome filesets for this step.

However, since --maf gets rid of almost 5/6 of the variants, plink2 requires much less memory to work with the post-MAF-filtering dataset.  So you only need a higher-memory machine for this one step.

Christopher Chang

unread,
Aug 29, 2018, 1:21:33 PM8/29/18
to plink2-users
Meanwhile, I will go ahead and post a "no singletons" version of the merged dataset on the Resources page later today.  This is essentially equivalent to a --maf 0.0003 filter, so the only absent variants will be ones you were going to immediately throw out.  It will be small enough for plink2 to work with on a machine with only 8 GiB RAM; you'll still need to add something like "--memory 6400" when running your --maf 0.01 filter, but not afterward.

Elayna Kirsch

unread,
Aug 29, 2018, 1:22:58 PM8/29/18
to plink2-users
Great, that is super helpful. Thank you so much!

Christopher Chang

unread,
Aug 29, 2018, 3:52:06 PM8/29/18
to plink2-users
No-singletons dataset is now posted.
Reply all
Reply to author
Forward
0 new messages