dealing with huge vcf.gz files - is there a way to read them in batches into plink?

75 views
Skip to first unread message

Michael P

unread,
Apr 12, 2021, 12:03:57 AM4/12/21
to plink2-users
Hi, 
I am trying to convert huge vcf.gz files into binary format to build ld matrices using --r square. By looking at the bp in the .bim file I noticed that only about the first 10% of snps for a given chromosome were being processed. No error is given - the job just ends, and then when I check the file it only has the first portion of the snps.
The number that is processed seem to go up as I allocate more memory, but I'm already giving hundreds of gbs and it doesn't seem close to getting to the end of the chromosome. I was wondering if there is any advice on how to get PLINK to successfully process these giant files. 

Thanks so much!
Michael

Christopher Chang

unread,
Apr 12, 2021, 11:33:00 AM4/12/21
to plink2-users
When you're running on a cluster and have a memory quota, you need to provide an explicit --memory parameter that's smaller than your quota.  Otherwise, plink assumes you're the primary user of the machine, exceeds the quota, and the system kills the plink job.
Reply all
Reply to author
Forward
0 new messages