Hi!
I'm currently facing the same difficulties trying to use Windows Server 2012 with HPC pack and the built in job scheduler. I can't get simulations to run on multiple nodes. The job scheduler reserves CPU:s on several nodes, but all the processes are running on one node.
Are you able to start FDS simulations on multiple nodes using the command prompt? When I try to run simulations on multiple nodes i get this result:
C:\CFD\2>mpiexec -file config.txt
Process 1 of 6 is running on HEADNODE.home.local
Process 0 of 6 is running on HEADNODE.home.local
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(476)............: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(82).............:
MPIC_Sendrecv(158)...........:
MPID_Isend(116)..............: failure occurred while attempting to send an eage
r message
MPIDI_CH3_iSend(175).........:
MPIDI_CH3I_Sock_connect(1212): [ch3:sock] rank 0 unable to connect to rank 2 usi
ng business card <port=49275 description=NODE1.home.local ifname=192.
168.0.2 >
MPIDU_Sock_post_connect(1231): unable to connect to NODE1.home.local
on port 49275, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1247): gethostbyname failed, No such host is known. (err
no 11001)
Process 4 of 6 is running on NODE2.home.local
Process 5 of 6 is running on NODE2.home.local
Process 6 of 6 is running on NODE2.home.local
job aborted:
rank: node: exit code[: error message]
0:
192.168.0.1: 1: Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(476)............: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(82).............:
MPIC_Sendrecv(158)...........:
MPID_Isend(116)..............: failure occurred while attempting to send an eage
r message
MPIDI_CH3_iSend(175).........:
MPIDI_CH3I_Sock_connect(1212): [ch3:sock] rank 0 unable to connect to rank 2 usi
ng business card <port=49275 description=WIN-FN5M18C04MN.hemma.local ifname=192.
168.0.2 >
MPIDU_Sock_post_connect(1231): unable to connect to WIN-FN5M18C04MN.hemma.local
on port 49275, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1247): gethostbyname failed, No such host is known. (err
no 11001)
1:
192.168.0.1: 1
2:
192.168.0.2: 1
3:
192.168.0.2: 1
4:
192.168.0.3: 1
5:
192.168.0.3: 1
6:
192.168.0.3: 1
I can start and run simulations on NODE1 and NODE2 from HEADNODE if the simulation is run on one node. If I try to run a simulation on two or more nodes I get the error messages above.
I'm thinking that the problem with running a simulation on multiple nodes can cause the problem with the job scheduler only using a single node for simulations?
I haven't been successful using MS-MPI to run FDS simulations.
Have you made any progress on the matter?
Thanks