hard to understand why plink runs out of memory

2,839 views
Skip to first unread message

freeseek

unread,
Jul 21, 2016, 10:21:31 AM7/21/16
to plink2-users
I have been trying to convert to plink a very large BCF file (consisting of 2,277 WGS samples). The program run out of memory. Here is the output:

PLINK v1.90p 64-bit (5 Jul 2016)           https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
 
--allow-extra-chr 0
 
--bcf /dev/stdin
 
--const-fid
 
--keep-allele-order
 
--make-bed
 
--memory 60416
 
--out plink
 
--split-x b37 no-fail
 
--vcf-idspace-to _


128993 MB RAM detected; reserving 60416 MB for main workspace.
--bcf: 111053k variants complete.Lines   total/split/realigned/skipped: 98415468/8087407/0/0
Lines   total/split/realigned/skipped: 111053313/0/5905079/0
--bcf: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam written.


Error: Out of memory.  The --memory flag may be helpful.

Plink actually never used more than 2GB of memory during the whole BCF conversion operation to a temporary plink file. My understanding is that after converting the whole BCF file to a temporary plink file, it attempted to allocate more memory than it was available through the --memory flag. I am not sure whether this was to load the temporary bim file (3.7GB) or whether this was to load the temporary bed file (60GB). Either way, it would have been nice to have an explanation of why it decided to be out of memory. How much memory did plink try to request? Could plink report how much memory it tried to allocate?

Christopher Chang

unread,
Jul 21, 2016, 10:52:32 AM7/21/16
to plink2-users
Hmm, yes, I could improve error reporting here by having the main memory allocation function set a global variable when it fails; then the function that actually prints the "Out of memory" error could check if that variable is nonzero and print it if so; this should work at least ~70% of the time.  I'll do this within the next few days.

Meanwhile, my first guess as to the cause is a very long variant ID.  PLINK 1.9 uses a rectangular [variant count] * [maximum variant ID string length + 1] array to store the IDs; this breaks down when very long indels are present and used in the variant IDs.  I've run into this problem, anyway, and have changed the 2.0 variant ID data structure accordingly.

freeseek

unread,
Jul 21, 2016, 11:04:47 AM7/21/16
to plink2-users
The longest variant ID in my case was 973 characters. Does this mean that plink tried to allocate [111,053,313] * [973 + 1] = 108,165,926,862 bytes of memory just to load the bim file in memory?

Christopher Chang

unread,
Jul 21, 2016, 11:36:54 AM7/21/16
to plink2-users
That is correct.  Also, many output files would be similarly large, because plink 1.x tends to pad the variant ID column with spaces to make it constant-width.

If it's impractical to use shorter IDs, you may want to split by chromosome.  Most filtering flags do not affect --bcf, but --chr does.  (You'll probably need to save a temporary .bcf file rather than just stream it in, though.)
Reply all
Reply to author
Forward
0 new messages