Running FDS on Windows Server 2012 with HPC pack

389 views
Skip to first unread message

Andrew

unread,
Sep 25, 2013, 12:47:38 PM9/25/13
to fds...@googlegroups.com

Hello

 

Currently I am running FDS on a Microsoft Windows cluster utilising the built in job scheduler provided by Microsoft to manage jjob submision and MPICH2 as the MPI.  However using this method the simulation will only run on a single node. I understand that FDS could be run without the job scheduler using MPICH2 on multiple nodes, however this would then lose the functionality of the job scheduler and the ability for it to manage the workload of multiple applications.

 

Therefore does anyone have any experience of running FDS using the Microsoft MPI, MS-MPI as opposed to MPICH2 and can this be done without having to recompile the source code.  Alternatively is there another way in which FDS could be used on Windows Server 2012 to more effectively utilise the Microsoft Job scheduler?

 

Thanks

 
 
 
 

Johan Borgman

unread,
Sep 26, 2013, 10:15:53 AM9/26/13
to fds...@googlegroups.com
Hi!

I'm currently facing the same difficulties trying to use Windows Server 2012 with HPC pack and the built in job scheduler. I can't get simulations to run on multiple nodes. The job scheduler reserves CPU:s on several nodes, but all the processes are running on one node.

Are you able to start FDS simulations on multiple nodes using the command prompt? When I try to run simulations on multiple nodes i get this result:

C:\CFD\2>mpiexec -file config.txt
Process   1 of   6 is running on HEADNODE.home.local
Process   0 of   6 is running on HEADNODE.home.local
Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(476)............: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(82).............:
MPIC_Sendrecv(158)...........:
MPID_Isend(116)..............: failure occurred while attempting to send an eage
r message
MPIDI_CH3_iSend(175).........:
MPIDI_CH3I_Sock_connect(1212): [ch3:sock] rank 0 unable to connect to rank 2 usi
ng business card <port=49275 description=NODE1.home.local ifname=192.
168.0.2 >
MPIDU_Sock_post_connect(1231): unable to connect to NODE1.home.local
on port 49275, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1247): gethostbyname failed, No such host is known. (err
no 11001)
Process   4 of   6 is running on NODE2.home.local
Process   5 of   6 is running on NODE2.home.local
Process   6 of   6 is running on NODE2.home.local

job aborted:
rank: node: exit code[: error message]
0: 192.168.0.1: 1: Fatal error in PMPI_Barrier: Other MPI error, error stack:
PMPI_Barrier(476)............: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier(82).............:
MPIC_Sendrecv(158)...........:
MPID_Isend(116)..............: failure occurred while attempting to send an eage
r message
MPIDI_CH3_iSend(175).........:
MPIDI_CH3I_Sock_connect(1212): [ch3:sock] rank 0 unable to connect to rank 2 usi
ng business card <port=49275 description=WIN-FN5M18C04MN.hemma.local ifname=192.
168.0.2 >
MPIDU_Sock_post_connect(1231): unable to connect to WIN-FN5M18C04MN.hemma.local
on port 49275, exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1247): gethostbyname failed, No such host is known. (err
no 11001)
1: 192.168.0.1: 1
2: 192.168.0.2: 1
3: 192.168.0.2: 1
4: 192.168.0.3: 1
5: 192.168.0.3: 1
6: 192.168.0.3: 1


I can start and run simulations on NODE1 and NODE2 from HEADNODE if the simulation is run on one node. If I try to run a simulation on two or more nodes I get the error messages above.

I'm thinking that the problem with running a simulation on multiple nodes can cause the problem with the job scheduler only using a single node for simulations?

I haven't been successful using MS-MPI to run FDS simulations.

Have you made any progress on the matter?

Thanks

Andrew

unread,
Oct 16, 2013, 12:21:42 PM10/16/13
to fds...@googlegroups.com
Hi
 
No i have not made any more progress on this issue.  However looking on the MPICH website it states that support for MPICH has stopped and instead to use MS-MPI.
 
Reply all
Reply to author
Forward
0 new messages