Output file buffer size

69 views
Skip to first unread message

Jim Perry

unread,
Oct 7, 2022, 2:37:17 AM10/7/22
to plink2-users
Is there a way to set the buffer size for output files?   

I'm using a large Unix system with nfs servers.  When output files are written a line at a time it creates a huge bottleneck and the jobs end up waiting on the disk i/o resources.  If the output buffer size can be increased (to something equilalent to 1000 or 10,000 lines), then the output data can be written in chunks and writing to drives on an nfs server will no longer be a bottleneck.

Thanks for any help with this!

Christopher Chang

unread,
Oct 7, 2022, 11:59:47 AM10/7/22
to plink2-users
Can you clarify what the problematic command (and plink version) is?  plink 2.0 uses an effective output buffer size of at least 128kb, and I have never heard of this being problematically small.

Chris Chang

unread,
Oct 8, 2022, 1:03:37 PM10/8/22
to Jim Perry, plink2-users
plink2 performs sequential reads with the fread() library function in this step.  Input buffer size is very unlikely to matter here, all modern operating systems should recognize this workflow.

What may help you, if you're performing lots of regressions on the same dataset, is using "--freq counts" to compute allele counts once, and then using those results instead of repeating the computation on subsequent runs; see https://www.cog-genomics.org/plink/2.0/input#read_freq .

On Sat, Oct 8, 2022 at 9:52 AM Jim Perry <jim8...@gmail.com> wrote:
After you mentioned that the output buffer size was at least 128k, I did some more investigating.  Turns out I was wrong about the bottleneck.  There is no issue with output.
The bottleneck I saw occurs when using Plink2 for regression during the time it calculates the allele frequencies (at the beginning of a GWAS run).  It seems to go disk-i/o-bound during this time, possibly due to reading the entire genotype pgen file.  I have plenty of memory on our machines.  Is there a way to increase the input buffer size for the allele frequency calculation portion?

Thanks much for your help!
Reply all
Reply to author
Forward
0 new messages