Dear all,
I try to run a R script which computes the Fst, Fit and Fis population statistics within plink1.9 beta 6.2 using the --R interface. I know there ist the --fst flag but this will give Fst values only and I'm also interested in Fit and Fis as well.
Since we are using slurm as a resource management on our computer cluster, I tried to limit the memory usage of plink by specifying --memory as well. Unfortunately, the memory usage of plink keeps rising while writing out the output. I'm wondering if there might be a memory leak in plink because as of my understanding, plink should read the data in chunks maximum as big as specified by --memory, pass it to the Rserve instance get the results back, write it to the output file and start over again. Thus, the memory usage of plink shouldn't increase with time.
For example, at the moment a plink process uses 15.0G resident memory according to top, but was started with --memory 7800. It started with a memory usage of ~2.0G and increased continuously during the last 2 hours. The bed file size is 11G.
This makes usage with a resource management system really hard because this job will be killed sooner or later as others did already.
The following can reproduce the issue:
R> library(Rserve)
R> Rserve(port = 1025)
$ cat test.R
Rplink <- function(PHENO,GENO,CLUSTER,COVAR)
{
f1 <- function(x)
{
r <- mean(x, na.rm=T) / 2
c( length(r) , r )
}
as.numeric( apply(GENO, 2 , f1) )
}
$ plink1.9b6.2 --dummy 1200 250000 --make-bed --out foo
$ plink1.9b6.2 --bfile foo --R test.R --out bar --memory 64 --R-port 1025 --allow-no-sex
The attached screenshots document the memory usage at beginning (16.1m), mid (395.6 > 64m given by --memory) and end (1.131g > 64m given by --memory) of plink.
I hope I could explain the problem well. If any questions remain, I'll try to answer them as soon as possible.
Greetings,
Damian