Error: Out of memory

2,709 views
Skip to first unread message

Nooshin Abbasi

unread,
Jul 3, 2019, 2:18:07 PM7/3/19
to plink2-users
Hello,

I am trying to merge data for all the chromosomes by the following command:

plink --merge-list all_files.txt --make-bed --out merged

as the the "out of memory" error popped up, I added the memory flag and assigned higher memory for the workspace, however, I am still receiving the same error.


PLINK v1.90b4.6 64-bit (15 Aug 2017)
Options in effect:
  --make-bed
  --memory 200000
  --merge-list all_files.txt
  --out merged

Hostname: xxxx
Working directory: xxxx
Start time: Wed Jul  3 11:05:47 2019

Random number seed: 1562177147
257842 MB RAM detected; reserving 200000 MB for main workspace.
Allocated 6334 MB successfully, after larger attempt(s) failed.

Error: Out of memory.  The --memory flag may be helpful.
Failed allocation size: 18276860288

End time: Wed Jul  3 11:07:37 2019


Could you please let me know why do I receive this error and how can I solve this issue?

Thank you.
Nooshin

Nooshin Abbasi

unread,
Jul 3, 2019, 3:57:00 PM7/3/19
to plink2-users
As an alternative way, can I merge the vcf files first and then convert them to the binary files with the following command-lines? (Although, I guess this approach may require as big memory as merging the binary files already does)

grep '^#' chr1.filtered.vcf > merge.vcf
grep -v '^#' chr1.filtered.vcf chr2.filtered.vcf ... chr22.filtered.vcf >> merge.vcf
plink --vcf merge.vcf --make-bed --allow-extra-chr --out merge

Or, on the other hand, merging binary files of chromosomes 2 by 2 iteratively to build up 22 chromosomes? (for example, 1 and 2 -> 12  ..... 3 and 4 -> 34  ..... 12 and 34 -> 1234)
Does either of these approaches help the memory issue?

Looking forward to hearing from you.
Thank you

Christopher Chang

unread,
Jul 3, 2019, 7:21:16 PM7/3/19
to plink2-users
The most likely reason for this error is the presence of a long variant ID. In particular, if your variant IDs look something like <chr>:<pos>_<ref>_<alt>, PLINK 1.9 will probably choke if you have any very long indels. This isn’t restricted to merging; you basically have to use PLINK 2.0 if you have any super-long variant IDs.

While your first VCF-concatenation command lines should work, you probably want to use “bcftools concat” for the merge instead.

Nooshin Abbasi

unread,
Jul 4, 2019, 1:19:23 PM7/4/19
to plink2-users
Thanks! I've upgraded however, plink2 does not support --merge-list option apparently. Could you please let me know what other options should I use?
I've seen an older post here that you suggested using https://bitbucket.org/gavinband/bgen/wiki/cat-bgen, however, I doubt if it works with my data as they are in .bim .bed .fam formats not .bgen, right?

Christopher Chang

unread,
Jul 4, 2019, 5:36:16 PM7/4/19
to plink2-users
For the merge, export to VCF and use the bcftools command I suggested.
Message has been deleted

DJon

unread,
Feb 24, 2020, 12:19:17 PM2/24/20
to plink2-users

Hi Christopher,

I also have a similar kind of memory issue when I'm going to make PCA. I tried with shortening the variant ids, but still get the same error as follows:

PLINK v1.90b6.15 64-bit (21 Jan 2020)          www.cog-genomics.org/plink/1.9/

(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to /data/chromosomes/all/PCA_train.log.

Options in effect:

  --bfile /data/chromosomes/all/everything_chr_edt

  --extract /data/chromosomes/all/for_pca.prune.in

  --keep /home/p_10_4/females_list

  --out /data/chromosomes/all/PCA_train

  --pca 4

 

773975 MB RAM detected; reserving 386987 MB for main workspace.

805426 variants loaded from .bim file.

488377 people (223477 males, 264811 females, 89 ambiguous) loaded from .fam.

Ambiguous sex IDs written to /data/chromosomes/all/PCA_train.nosex

.

--extract: 421691 variants remaining.

--keep: 209940 people remaining.

Using up to 31 threads (change this with --threads).

Before main variant filters, 209940 founders and 0 nonfounders present.

Calculating allele frequencies... done.

Total genotyping rate in remaining samples is 0.965961.

421691 variants and 209940 people pass filters and QC.

Note: No phenotypes present.

Excluding 9911 variants on non-autosomes from relationship matrix calc.

Relationship matrix calculation complete.

 

Error: Out of memory.  The --memory flag may be helpful.

Failed allocation size: 352598428800


When I check my memory in the hpc :


State: FAILED (exit code 1)

Nodes: 1

Cores per node: 32

CPU Utilized: 35-16:12:19

CPU Efficiency: 80.14% of 44-12:24:32 core-walltime

Job Wall-clock time: 1-09:23:16

Memory Utilized: 246.33 GB

Memory Efficiency: 33.68% of 731.45 GB


So there is enough memory. What would be the reason for this error? Is there any other way to calculate PCA for all these subjects with all these variants? 

When I worked with lesser number of subjects (10%), it worked without an issue. 


Thanks.

Djon

 

Christopher Chang

unread,
Feb 24, 2020, 12:22:00 PM2/24/20
to plink2-users
This is expected with hundreds of thousands of samples.  Use plink 2.0's "--pca approx" instead in this setting.
Reply all
Reply to author
Forward
0 new messages