plink 1.9 out of memory

5 views
Skip to first unread message

Aurora Moreno racero

unread,
Nov 10, 2025, 10:30:13 AM (yesterday) Nov 10
to plink2-users

Hi all,

I’m running into a persistent “out of memory” error when trying to convert a large VCF to PLINK format, even with the --memory 1500 limit explicitly set. The command I’m using is:

zcat /bio-scratch/Aurora/VCF_report/plink/1000_genome/vcf_originales/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz | \ plink --vcf /dev/stdin \ --extract range /bio-scratch/Aurora/VCF_report/preprocessing_graf/files_para_filtrar_snps/snps_37_with_aliases.txt \ --snps-only just-acgt \ --biallelic-only \ --double-id \ --vcf-half-call missing \ --new-id-max-allele-len 31 \ --make-bed \ --memory 1500 \ --allow-extra-chr \ --out plink

From what I can tell, PLINK seems to create large temporary files during VCF parsing before writing the final .bed/.bim/.fam output, which likely exceeds the available memory and disk I/O budget. Unfortunately, I can’t increase the memory limit beyond 1.5 GB in this environment.

Do you have any recommendations for working around this?

If this is expected behavior, I’d also appreciate any hints about possible optimizations or upcoming improvements in PLINK’s handling of large gzipped VCFs through stdin.

Thanks a lot for your time and for maintaining this great tool!

Chris Chang

unread,
Nov 10, 2025, 11:59:39 AM (yesterday) Nov 10
to Aurora Moreno racero, plink2-users
Hi,

0. Please include full .log file(s) when asking for troubleshooting help.

1. The .bed+.bim+.fam fileset that would result from this operation is >4 GB.  So if your *disk* quota is smaller than twice that -- even if you're deleting intermediate filesets along the way, you need enough space to simultaneously store input and output filesets -- you can't use PLINK 1.9 here without breaking chromosomes into smaller chunks.
However, you can stick to full chromosomes if you use PLINK 2.0's .pgen format (and https://www.cog-genomics.org/plink/2.0/resources provides a more convenient starting point for what you're trying to do here).

2. You shouldn't need to use zcat; PLINK 2.0 --vcf automatically decompresses gzipped files, and this decompression takes >10x less time than zcat on most multicore machines.  And for other reasons, PLINK 2.0 --vcf doesn't support piped input at all.  You're on your own if you still insist on using that highly inefficient workflow.


--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/22b788ca-3dbf-4838-b392-6ab1726b19ecn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages