Hello,
in a recent run of sambamba merge (0.7.0) I observed an incredibly large memory usage for what should be virtually zero.
According to SLURM accounting, an execution took 2572s and used 6.33GiB of memory.
That could make sense if it were a sort, but it was just a merge of already sorted inputs, so the theoretical minimum memory need is just one record per input file. There were only 4 input files and the output file size is 182GiB (compression level 2).
Buffers can explain some memory consumption, but you don't need GiB-level buffers for optimal IO performance.
How much memory should I allocate when running a merge? A function of the input size? A fixed quantity?
This is clearly a memory allocation bug. Probably some kind of memory leak, as there is no reason that explains the huge memory usage. Or there's some reason why it is using this much memory?