FDS simulation crashes if either internet or VPN is disconnected

995 views
Skip to first unread message

SD

unread,
Nov 28, 2016, 1:58:48 PM11/28/16
to FDS and Smokeview Discussions
Hi,

My MPI parallel fds simulation crashes everytime the internet connection is lost. The simulation also crashes whenever the VPN connection changes, i.e, if I either connect to a VPN or disconnect from an existing VPN connection, the running fds simulation suddenly crashes.

Can this be avoided? Why does FDS need an internet connection and how can I set up fds such that my fds simulation never crashes abruptly.


Thanks much,
Sai

Kevin

unread,
Nov 28, 2016, 2:01:17 PM11/28/16
to FDS and Smokeview Discussions
What is the operating system -- Windows, linux or OS X?

Kevin

unread,
Nov 28, 2016, 2:02:15 PM11/28/16
to FDS and Smokeview Discussions
Are you using multiple computers to run the job, or just multiple cores on a single computer?


On Monday, November 28, 2016 at 1:58:48 PM UTC-5, SD wrote:

SD

unread,
Nov 28, 2016, 2:22:35 PM11/28/16
to FDS and Smokeview Discussions

Kevin,

I am on Windows 7 Enterprise (SP1), 64 bit OS, Intel core i7 with 4 processors and 16 GB. 

I am using multiple cores (4 or less) on a single computer, but this issue occurs with both serial and parallel runs irrespective of the number of processors used. When using a parallel run, I invoke the mpiexec command. But for serial jobs I run them as 'fds chid.fds'.



Kevin

unread,
Nov 28, 2016, 2:48:23 PM11/28/16
to FDS and Smokeview Discussions
Even when you run fds like this

fds chid.fds

you are using the MPI libraries, with just a single process. I'll try this experiment here on my computer.

Kevin

unread,
Nov 28, 2016, 3:28:06 PM11/28/16
to FDS and Smokeview Discussions
I started a job on my computer, like

fds chid.fds

and then disconnected my Internet cable. The job continued to run. I then started a job

mpiexec -n 4 fds chid.fds

and then disconnected my internet cable. The job continued to run. I cannot test the VPN functionality here at work.

Are you writing to a remote server; that is, are you working from a folder (directory) on your own machine, or are you using a remote file server? In other words, are you working completely within your own computer and not using any external resource?

SD

unread,
Nov 28, 2016, 4:14:21 PM11/28/16
to FDS and Smokeview Discussions
Kevin,

Thanks for testing, that it how I would like my fds to behave. Yes, I save all my files in the 'C' folder of my computer locally. The command I use for my parallel runs is:

mpiexec -n 4 -localonly fds chid.fds

I am currently running a simulation on 4 cores on my computer and I am on wifi. I am pretty sure if I disconnect the wifi, my simulation will crash. 

I will test now to report back the exact error I get.

Kevin

unread,
Nov 28, 2016, 4:16:39 PM11/28/16
to FDS and Smokeview Discussions
Try not using -localonly. I do not use this option.

SD

unread,
Nov 28, 2016, 5:43:30 PM11/28/16
to FDS and Smokeview Discussions
Kevin,

Strangely, I was not able to reproduce the error with several tries. Now I know it is not repeatable. Just when I was able to give up on trying to reproduce the error I get, the simulation crashed with the following error. This happened when I started a simulation with no internet. But when I connected it to the internet, the simulation crashed with the following message. BTW, this output is for the simulation started using mpiexec in parallel but with only 1 core.

 Time Step:    500, Simulation Time:      8.19 s
 Time Step:    600, Simulation Time:      9.70 s
 Time Step:    700, Simulation Time:     11.29 s
 Time Step:    800, Simulation Time:     12.89 s
 Time Step:    900, Simulation Time:     14.45 s
[mpiexec@SCOJM89362] ..\hydra\pm\pmiserv\pmiserv_cb.c (781): connection to proxy 0 at host SCOJM89362 failed
[mpiexec@SCOJM89362] ..\hydra\tools\demux\demux_select.c (100): callback returned error status
[mpiexec@SCOJM89362] ..\hydra\pm\pmiserv\pmiserv_pmci.c (500): error waiting for event
[mpiexec@SCOJM89362] ..\hydra\ui\mpich\mpiexec.c (1119): process manager error waiting for completion


But, I tried the same thing once again, and this time it wont crash. Not sure what the source of the error is. But now that I have come to expect that it crashes due to changes in connection, I have gotten used to using to restart option and simply restarting my simulations.

Not sure what to make of the above error message.

When I use the command without the flag 'localonly', I get an error for credentials. 
--------------------------------------------------------
SD035688@SCOJM89362 /cygdrive/c/Users/TestCrashCyg2
$ mpiexec -n 1 fds FH_test.fds
Credentials for SD035688 rejected connecting to SCOJM89362
read from stdin failed, error 6.
[mpiexec@SCOJM89362] ..\hydra\tools\demux\demux_select.c (78): select error (No such file or directory)
[mpiexec@SCOJM89362] ..\hydra\pm\pmiserv\pmiserv_pmci.c (500): error waiting for event
[mpiexec@SCOJM89362] ..\hydra\ui\mpich\mpiexec.c (1119): process manager error waiting for completion
--------------------------------------------------------------------------------------------------------------
SD035688@SCOJM89362 /cygdrive/c/Users/TestCrashCyg2
$ mpiexec -n 1 -localonly fds FH_test.fds
 Mesh      1 is assigned to MPI Process      0
 OpenMP thread   0 of   3 assigned to MPI process      0 of      0
 OpenMP thread   2 of   3 assigned to MPI process      0 of      0
 OpenMP thread   3 of   3 assigned to MPI process      0 of      0
 OpenMP thread   1 of   3 assigned to MPI process      0 of      0
 Completed Initialization Step  1
 Completed Initialization Step  2
 Completed Initialization Step  3
 Completed Initialization Step  4

 Fire Dynamics Simulator

 Current Date     : November 28, 2016  22:36:19
 Version          : FDS 6.3.2
-------------------------------------------------


Kevin

unread,
Nov 29, 2016, 8:42:08 AM11/29/16
to FDS and Smokeview Discussions
This is the reason why we recommend that users invest in a linux cluster or cloud computing services. If you are having this much trouble running MPI on a single computer, imagine how difficult it will be using multiple computers. We here at NIST can only compile fds.exe, but we cannot guarantee that it will work when you, say, disconnect/connect to and from the Internet, or use a remote disk drive, or run other programs simultaneously, or install certain kinds of anti-virus software, or erect firewalls.

All I can say is that when you get a job running, walk away from the computer and don't touch it until the job is done.

SD

unread,
Nov 29, 2016, 4:40:41 PM11/29/16
to FDS and Smokeview Discussions
I prefer a linux platform myself, but I dont have a choice at the moment. 

Thanks for your thoughts and your investigation. I appreciate your help a lot and thanks for spending time on this.

I agree that it could very well be an external software that could be interfering with fds. If I get to the bottom of this at a later stage, I will update the thread.

Sabalcore FDS

unread,
Nov 30, 2016, 4:28:42 PM11/30/16
to FDS and Smokeview Discussions
It looks like your routing table or dns server is being modified when you disconnect/connect from the internet or the VPN and possibly an SSH issue. It looks like you're using MPICH2 with hydra. Try running mpiexec with the -hosts option and specifying the IP address of your host. For example:

mpiexec -n 1 -hosts 127.0.0.1 fds FH_test.fds

SD

unread,
Dec 6, 2016, 3:11:19 PM12/6/16
to FDS and Smokeview Discussions
Sabalcore FDS,
Thanks for your input.
So, today my FDS simulation crashed as the VPN disconnected without warning. So, I am trying your idea, but I get the error (see below) when I use the -hosts flag
$ mpiexec -n 4 -hosts 127.0.0.1 fds filename1.fds
[mpiexec@SCOJM89362] hostlist_fn (..\hydra\ui\mpich\utils.c:1321): missing host name after -hosts option.
[mpiexec@SCOJM89362] ..\hydra\utils\args\args.c (243): match handler returned error
[mpiexec@SCOJM89362] ..\hydra\utils\args\args.c (269): argument matching returned error
[mpiexec@SCOJM89362] parse_args (..\hydra\ui\mpich\utils.c:4498): error parsing input array
[mpiexec@SCOJM89362] HYD_uii_mpx_get_parameters (..\hydra\ui\mpich\utils.c:4819): unable to parse user arguments

In order to get rid of the error, I am now using -hosts "computername" instead of -hosts 127.0.0.1. Using my computer name did not give an error for hosts, but it gave me an error for credentials.

Then in order to get rid of the error for credentials. I use "mpiexec -register". This command lets me enter my credentials. Once the credentials are encrypted and saved. I reenter the command,

$ mpiexec -n 4 -hosts "computername" fds filename1.fds

It is running for now, but it will be interesting to see when my VPN disconnects the next time without warning, if my simulation will keep running or not. This might take a couple days or more.



SD

unread,
Dec 12, 2016, 2:18:20 AM12/12/16
to FDS and Smokeview Discussions
Sabalcore FDS,

To follow up, my fds simulation crashed even when using the command
$ mpiexec -n 4 -hosts "computername" fds filename1.fds

The -hosts "computername" flag didnt work.

Kevin

unread,
Dec 12, 2016, 9:06:51 AM12/12/16
to FDS and Smokeview Discussions
mpiexec -n 4 -hosts 1 my_computer 1 test_mpi

worked for m.

Sabalcore FDS

unread,
Dec 14, 2016, 10:02:17 AM12/14/16
to FDS and Smokeview Discussions
OK. My best guess is that your network is being restarted when the VPN stops or starts. This would cause the problem you see because MPI would detect the networking going down.


On Monday, December 12, 2016 at 2:18:20 AM UTC-5, SD wrote:
Reply all
Reply to author
Forward
0 new messages