plink2 requiring large amounts of memory (?) when writing to .bed but not .pgen

88 views
Skip to first unread message

Freida Blostein

unread,
May 22, 2025, 7:14:19 PM5/22/25
to plink2-users
I'm getting memory errors when subsetting samples using --keep and --make-bed from a source pgen using plink2 and SLURM on a HPC cluster. plink2 when reading in pgen and writing out bed, plink2 fails with an "Error: .pgen file read failure: Cannot allocate memory." These same errors do not occur with the exact same files, inputs and resource allocation when --make-pgen is used instead of --make-bed. 

Considering standard advice that plink operations should be fine with 8GB up to 50 million variants, I'm confused. I said memory in the title but I'm not certain this is a memory issue. I can get --make-bed to run on login/gateway nodes of my HPC (which have ~128GB) but in a slurm job providing ~15GB and 1 CPU the command will fail with "Error: .pgen file read failure: Cannot allocate memory" even when using the --memory flag with plink to limit plink to <14GB. I can inconsistently run to completion if I increase the requested SLURM CPUs/memory. If I run using 8CPUs and the --memory flag limiting plink to <14GB, the plink log reports limiting the workspace appropriately, but seff on the completed job reveals much higher memory usage. The source pgen I'm testing is quite large (~240GB, ~3.3 million variants), but the subsetting I've been testing with is small (0.5-1.5GB final bed outputs, for the pgen outputs ~2.4GB). 

I'll note that I only encountered this behavior after the HPC cluster switched from CentOS to Rocky9. I also get the same errors across a couple of versions of plink that I've tested: most recent release (PLINK v2.0.0-a.6.13LM AVX2 Intel (15 May 2025)), and two other releases (PLINK v2.00a2LM AVX2 Intel (25 Oct 2019) & PLINK v2.00a6LM AVX2 Intel (24 Oct 2023). I don't get the same errors when doing --make-bed from bed files using plink 1.9. 

I'm attaching logs from the runs described below, all runs have the --memory flag 14000 set in plink2 

FAILS: Plink2 --make-bed from pgen with 15GB 1 CPU 
logfile: TROUBLESHOOTPLINK_pgen_to_bed_most_recent_release

FAILS: Plink2 --make-bed from bed with 15 GB 1 CPU 
log file: TROUBLESHOOTPLINK_bed_to_bed_most_recent_release_1

FAILS: plink2 --make-bed from pgen with 120GB, 12CPU requested 
log file: TROUBLESHOOTPLINK_pgen_to_bed_most_recent_release_8CPU_120GB
seff on this job reveals ~114 GB used even though plink reports only reserving 14GB for the workspace  

SUCCEEDS: Plink2 --make-pgen from pgen with 15GB 1 CPU 
log file: TROUBLESHOOTPLINK_pgen_to_pgen_most_recent_release_1

SUCCEEDS: Plink2 --make-bed from pgen run on log in node (128GB available)
TROUBLESHOOT_LOGINNODE.log


TROUBLESHOOTPLINK_bed_to_bed_most_recent_release_1.txt
TROUBLESHOOTPLINK_pgen_to_bed_most_recent_release_8CPU_120GB_1.txt
TROUBLESHOOTPLINK_pgen_to_bed_most_recent_release_1.txt
TROUBLESHOOT_LOGINNODE.log
TROUBLESHOOTPLINK_pgen_to_pgen_most_recent_release_1.txt

Chris Chang

unread,
May 23, 2025, 12:33:46 PM5/23/25
to Freida Blostein, plink2-users
This is a very common operation, yet nobody else has ever reported the "Cannot allocate memory" error message even though you note that it appears when you use a build from 2019.  So it looks like there is an unusual interaction occurring between plink2 and that system.

I updated the main plink2 build machine last week, so future plink2 precompiled binaries will use a newer version of glibc than was used for the last eight years.  I have posted a plink2 binary compiled on the new machine to https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20250523a.zip .  Check if the problem remains with that binary; if it does, I can then post a debug build that can print more diagnostic information.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/f77ecba3-0bf9-46b1-918b-a36617156e1fn%40googlegroups.com.

Freida Blostein

unread,
May 23, 2025, 2:26:50 PM5/23/25
to plink2-users

Yes, I agree about plink2 and the HPC slurm system configuration interacting strangely, because I only encountered the issue once the HPC switched to Rocky9 from CentOS7. At first I thought it was exclusively an issue with slurm configuration, but I don't get the same issues with plink1.9 or even with plink 2 when running --make-pgen, which is why I posted here. I am just a user, not an admin on the HPC (and not even an expert user at that). 

I still get the same error with the new version of plink2 you posted, log file attached. 

The only thing remotely similar to my problem that I could find was this discussion of optimizing plink memory usage for small memory on virtual machines. So in this run I printed out the cat /proc/meminfo within the job and notice that the total memory is being listed as 131379944 kB (see top of log file). I'm wondering if maybe the way our HPC is set up the MemAvailable is corresponding to the entire node rather than the requested resources, and then plink2 is accessing that for some reason? but I still do not understand why it would be creating a problem for --make-bed but not --make-pgen or why it would persist as a problem when I am explicitly calling the --memory flag in plink2. 


1cpu_15G_newplink_1.txt

Chris Chang

unread,
May 23, 2025, 3:33:22 PM5/23/25
to Freida Blostein, plink2-users
Ok, I have posted a debug build to https://s3.amazonaws.com/plink2-assets/plink2_linux_avx2_20250523b.zip ; try running your command with the --debug flag added with that build.  This should print /proc/meminfo numbers from the middle of the failing .bed-write operation, along with information about file-read operations being performed.

Freida Blostein

unread,
May 23, 2025, 4:16:52 PM5/23/25
to plink2-users
I downloaded the new link but I'm not sure it is the debug version? Because my log is not printing out more information 
Screenshot 2025-05-23 at 4.16.29 PM.png
1cpu_15G_debugplink_1.txt

Chris Chang

unread,
May 23, 2025, 4:23:22 PM5/23/25
to Freida Blostein, plink2-users
Did you add —debug to your command line?

Freida Blostein

unread,
May 23, 2025, 4:30:14 PM5/23/25
to plink2-users
oh my gosh so sorry, I did not, my error fixing right now. 

Freida Blostein

unread,
May 23, 2025, 4:35:13 PM5/23/25
to plink2-users
1cpu_15G_debugplink_1.txt

Chris Chang

unread,
May 23, 2025, 5:25:05 PM5/23/25
to Freida Blostein, plink2-users
Thanks.  Looks like I garbled the variant_idx/mem_available_kib print statement, but this confirms the basic picture re: big fread() calls not being handled properly by this system.  (You didn't see the problem with plink 1.9 --make-bed and plink 2.0 --make-pgen because they only make small fread() calls.)  It seems like plink2 itself is not using more than the ~14 GB of memory you specified, but the operating system is spending too much additional memory on disk cache when large fread() calls are made.

The most closely-related bug reports I've found so far are https://github.com/aws/aws-cli/issues/5876 and https://github.com/docker/for-linux/issues/651 .

In the meantime, you can see if plink2 --make-bed works with e.g. "--memory 4000" (you have a lot less than 50 million variants, after all).

Reply all
Reply to author
Forward
0 new messages