My runs using the parallel version "migrate-n-mpi" just stop after a few hours, with no clear error message appearing in the printed output. The dataset (see attached "migrate-n-mpi_8pop_38loci_infile") can be summarised as:
- 8 populations
- 38 loci (SNPs)
For now I was mostly just using the default settings (see attached "migrate-n-mpi_8pop_38loci_parmfile"), but I made sure the input data type was set to SNPs. In the printed output I see that the burn-in and sampling of each loci seems to be completed, but shortly after this stage the program just stops running (see attached "migrate-n-mpi_8pop_38loci_outputlog"). Since I was running only 1 replicate I initiated the run using (38 x 1) +1 = 39 cores.
What could be the problem, or where could I possibly find out what the problem is? First I thought it could have been a memory issue, but this does not seem the case.
(ps. I have been running the same dataset under the same settings in the single CPU "migrate-n" version and that seems to work fine. In the printed output I see that the program has reached the stage at which the parameters are being printed and the prognosed run time is given. However, this run will just take very long so I would like to successfully run the parallel version.)
I appreciate any suggestions/tips.
Thijs M.P. Bal