I'm getting memory errors when subsetting samples using --keep and --make-bed from a source pgen using plink2 and SLURM on a HPC cluster. plink2 when reading in pgen and writing out bed, plink2 fails with an "Error: .pgen file read failure: Cannot allocate memory." These same errors do not occur with the exact same files, inputs and resource allocation when --make-pgen is used instead of --make-bed.
Considering standard advice that plink operations should be fine with 8GB up to 50 million variants, I'm confused. I said memory in the title but I'm not certain this is a memory issue. I can get --make-bed to run on login/gateway nodes of my HPC (which have ~128GB) but in a slurm job providing ~15GB and 1 CPU the command will fail with "Error: .pgen file read failure: Cannot allocate memory" even when using the --memory flag with plink to limit plink to <14GB. I can inconsistently run to completion if I increase the requested SLURM CPUs/memory. If I run using 8CPUs and the --memory flag limiting plink to <14GB, the plink log reports limiting the workspace appropriately, but seff on the completed job reveals much higher memory usage. The source pgen I'm testing is quite large (~240GB, ~3.3 million variants), but the subsetting I've been testing with is small (0.5-1.5GB final bed outputs, for the pgen outputs ~2.4GB).
I'll note that I only encountered this behavior after the HPC cluster switched from CentOS to Rocky9. I also get the same errors across a couple of versions of plink that I've tested: most recent release (PLINK v2.0.0-a.6.13LM AVX2 Intel (15 May 2025)), and two other releases (PLINK v2.00a2LM AVX2 Intel (25 Oct 2019) & PLINK v2.00a6LM AVX2 Intel (24 Oct 2023). I don't get the same errors when doing --make-bed from bed files using plink 1.9.
I'm attaching logs from the runs described below, all runs have the --memory flag 14000 set in plink2
FAILS: Plink2 --make-bed from pgen with 15GB 1 CPU
logfile: TROUBLESHOOTPLINK_pgen_to_bed_most_recent_release
FAILS: Plink2 --make-bed from bed with 15 GB 1 CPU
log file: TROUBLESHOOTPLINK_bed_to_bed_most_recent_release_1
FAILS: plink2 --make-bed from pgen with 120GB, 12CPU requested
log file: TROUBLESHOOTPLINK_pgen_to_bed_most_recent_release_8CPU_120GB
seff on this job reveals ~114 GB used even though plink reports only reserving 14GB for the workspace
SUCCEEDS: Plink2 --make-pgen from pgen with 15GB 1 CPU
log file: TROUBLESHOOTPLINK_pgen_to_pgen_most_recent_release_1
SUCCEEDS: Plink2 --make-bed from pgen run on log in node (128GB available)TROUBLESHOOT_LOGINNODE.log