Hello,
I have a problem with the way Nexus bundles the QMCPack part of the twist averaging jobs.
My setup is a desktop with a maximum of 8 MPI ranks and 2 threads (machine='ws16').
For example, for a system with 4 independent twists, Nexus creates a
dmc.in file containing
dmc.g000.twistnum_0.in.xml
dmc.g001.twistnum_1.in.xml
dmc.g002.twistnum_2.in.xml
dmc.g003.twistnum_3.in.xml
dmc.inp is passed into QMCPack, the code creates 4 MPI groups and then runs all 4 jobs simultaneously, each with 2 MPI ranks.
As an example, the output of dmc.g001.twistnum_1.in.xml contains
Total number of MPI ranks = 8
Number of MPI groups = 4
MPI group ID = 1
Number of ranks in group = 2
MPI ranks per node = 8
OMP 1st level threads = 2 The problem is that for a system with more twists (e.g. 10), Nexus is trying to do the same, leading to an error (in dmc.err):
Fatal Error. Aborting at main(). Current 8 MPI ranks cannot accommodate all the 10 individual calculations in the ensemble. Increase the number of MPI ranks or reduce the number of calculations.
Abort(1) on node 0 (rank 0 in comm 496): application called MPI_Abort(world, 1) - process 0
Abort(1) on node 1 (rank 1 in comm 496): application called MPI_Abort(world, 1) - process 1
Abort(1) on node 2 (rank 2 in comm 496): application called MPI_Abort(world, 1) - process 2
Abort(1) on node 3 (rank 3 in comm 496): application called MPI_Abort(world, 1) - process 3
Abort(1) on node 4 (rank 4 in comm 496): application called MPI_Abort(world, 1) - process 4
Abort(1) on node 5 (rank 5 in comm 496): application called MPI_Abort(world, 1) - process 5
Abort(1) on node 6 (rank 6 in comm 496): application called MPI_Abort(world, 1) - process 6
Abort(1) on node 7 (rank 7 in comm 496): application called MPI_Abort(world, 1) - process 7Is there a way to split the bundled TA QMCPack calculations? I found nothing in the documentation nor in the available examples.
If you need an example, try to run
with 4x4x4 kgrid on a laptop.
I would be grateful for any help or suggestions.
Best,
Matej