FDS cluster issue

336 views
Skip to first unread message

Sigurður Bjarni Gíslason

unread,
Apr 11, 2022, 8:35:44 AM4/11/22
to FDS and Smokeview Discussions
Hi,

I am currently successfully using 12 processors and 12 meshes on a single computer to run FDS simulation.
Now I am aiming for using more computers in cluster.
To start with, I have tried to use 11 processors from the main FDS computer, and then using 1 processor from my desktop computer (remote computer) to see if it works.
I have tried to follow the instructions in FDS User's guide (ch. 3.2) along with my IT guy but have not managed to get it running. Hoping someone can guide us.

"If you wish to run FDS on more than one computer, do the following:
1. Create a text file, say hostfile.txt, and in it list, line by line, the names of your computers."

We have done this with our remote computer "Nameofremotecomputer" but are not sure where to place this file, and have tried 2 locations. One location being PyroSim 2021<fds<mpi (which has space in the name by origin, is this a problem, like in step 3?)
Should I only write the name of the other computers I will use, or also the computer that I am running FDS from?

Next step in the FDS User manual...
2. Test your network by running the following test program:
mpiexec -n <procs> -f hostfile.txt test_mpi where <procs> is the number of computers you want to test. If this command returns a “Hello World” message from each of your computers, proceed to the next step. If this command fails, check that you can “see” the other machines by “pinging” them, and check that the other computer can “see” your
computer as well. Also, make sure that the same version of FDS is installed on the other computers.


We didn't get any Hello World message when we wrote:
mpiexec -n 1 -f hostfile.txt test_mpi (when I was testing one remote computer)
We were located in the mpi directory when writing that command.

The error message that I get is the following (Nameofsimulationcomputer is the main computer and Nameofremotecomputer is the second computer):
C:\Program Files\PyroSim 2021\fds\mpi>mpiexec -n 1 -f hostfile.txt test_mpi
[proxy:0:0@Nameofremotecomputer] launch_processes (proxy.c:571): error creating process (error code 2). The system cannot find the file specified.
[proxy:0:0@Nameofremotecomputer] main (proxy.c:927): error launching_processes
[mpiexec@Nameofsimulationcomputer] wmain (mpiexec.c:2113): assert (exitcodes != NULL) failed


Pinging worked though.

We have also followed the instructions from Pyrosim user manual in chapter 19.5.
Pyrosim is installed in exactly same folder on all (2) cluster machines
Pyrosim installations are exactly the same version
Simulation folder has no space in name and is accessible to both machines

Help would be much appreciated.
And even an online meeting if anyone has the time.

Sigurdur Bjarni Gislason



Kevin McGrattan

unread,
Apr 11, 2022, 9:49:47 AM4/11/22
to fds...@googlegroups.com
hostfile.txt goes into the working directory; that is, the directory you are in when you invoke the MPI command. Note that neither fds.exe nor testmpi.exe has any knowledge of PyroSim working directories.

Here is another test to perform:

First, put only the name of the computer at which you are working in the hostfile.txt file. Then type at the command prompt

mpiexec -n 3 -hostfile hostfile.txt hostname

You should see the name of the computer printed three times. That is all "hostname" does. It is not an FDS program. It is just a program native to DOS. If this does not work, then there must be something wrong with the installation of mpiexec. That would be a Thunderhead issue.

Next, write the following into hostfile.txt

FDScomputername:3
Othercomputername:2

Obviously, write the proper names of the computer. Then issue this command

mpiexec -machinefile hostfile.txt hostname

If successful, you should see the FDS computer written 3 times and the other computer written 2 times. 

If this does not work, try to manually register your network credentials:

mpiexec -register

For any of this to work, you must have a single domain username and password. That is, you login to all computers using the same credentials, and those credentials are always the same on all computers.

Let's see if his helps








Sigurður Bjarni Gíslason

unread,
Apr 11, 2022, 11:30:11 AM4/11/22
to FDS and Smokeview Discussions
Bryan.

Thank you, this works
My hosfile.txt is in the same directory as the mpiexec.exe file (folder called mpi).
What's next?

Sigurdur

Sigurður Bjarni Gíslason

unread,
Apr 11, 2022, 11:41:29 AM4/11/22
to FDS and Smokeview Discussions
Kevin, I was going to say...sorry :-)

Kevin

unread,
Apr 11, 2022, 11:51:57 AM4/11/22
to FDS and Smokeview Discussions
You should not be working in the same directory where FDS is installed. You should work in a directory where you keep your working files. Put hostfile.txt in this working directory.

If you say everything is working, then try

mpiexec -machinefile hostfile.txt -wdir \\computername\directoryname  fds jobjname.fds

-wdir is the "working directory" name and it should be recognized by both computers.


Kevin

unread,
Apr 11, 2022, 1:19:40 PM4/11/22
to FDS and Smokeview Discussions
I just had a Zoom call with the person who posted this thread. I cannot figure out the problem with PyroSim, and when I tried to run the case using the command line, I could not find the special CMDfds prompt. In any case, it appears that the computers that he is using are properly configured, and each computer can see each other.

Can someone from Thunderhead take a look at this. Either we address the error that we get within the PyroSim GUI, or tell me how to use the CMDfds prompt when PyroSim is installed. I do not want to mess around with path variables because I will then probably break PyroSim.

Bryan Klein

unread,
Apr 11, 2022, 1:39:18 PM4/11/22
to FDS and Smokeview Discussions
When the 'Run Cluster...' feature in PyroSim is being used there is no need for a machinefile to be specified, the hosts involved with the simulation and working directory is set through the UI, and everything is passed into the command when the simulation is started. This looks like something that we could handle better through sup...@thunderheadeng.com where we can setup a time to meet on Zoom, or get more information specific to the setup as needed.

-Bryan


Bryan Klein

unread,
Apr 11, 2022, 1:53:28 PM4/11/22
to FDS and Smokeview Discussions
Sigurdur,

Can you verify consistent mpi credentials on the child nodes in the cluster.
Going through the process linked to below on all of the computers in the cluster.
https://support.thunderheadeng.com/docs/pyrosim/2021-4/user-manual/#_pyrosim_requires_a_password_to_run_parallel

- Bryan

Kevin McGrattan

unread,
Apr 11, 2022, 2:01:17 PM4/11/22
to fds...@googlegroups.com
Bryan -- maybe this is already happening, but can PyroSim do a quick "Hello World" test to ensure that the computers are properly linked, configured, etc. I use this test_mpi.exe executable to test connectivity. 

Sigurður Bjarni Gíslason

unread,
Apr 11, 2022, 2:33:28 PM4/11/22
to FDS and Smokeview Discussions
Thank you for assisting, Bryan.
I did the process you suggested on both computers with success.
I still get a Fatal error in PMPI Barrier (whatever that means) when trying to run my text case.
I will send a request for a meeting through the support email if that is better.

Siggi

Bryan Klein

unread,
Apr 11, 2022, 2:33:31 PM4/11/22
to fds...@googlegroups.com
That's a great suggestion, we don't currently have a test like that.

Maybe a 'Test Cluster' button that runs a test function like that to make sure communication and file I/O to the working directory is functional.

- Bryan

On Mon, Apr 11, 2022, 11:01 Kevin McGrattan <mcgr...@gmail.com> wrote:
Bryan -- maybe this is already happening, but can PyroSim do a quick "Hello World" test to ensure that the computers are properly linked, configured, etc. I use this test_mpi.exe executable to test connectivity. 

--
You received this message because you are subscribed to a topic in the Google Groups "FDS and Smokeview Discussions" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fds-smv/9Xb1yDn82yQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fds-smv+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/CAAJimDH0XoQ5oH06AshZJ_vP7pp6jiqxzwxptoUTGAFjoVtBdA%40mail.gmail.com.

Kevin McGrattan

unread,
Apr 11, 2022, 2:39:32 PM4/11/22
to fds...@googlegroups.com
MPI_BARRIER is a simple MPI routine that requires all MPI processes to stop at a particular point in the code until all processes catch up. There are several of these calls in the beginning of the code, and the error message probably means that there is something fundamentally wrong with the configuration. That is, the word "barrier" is not particularly relevant. It just means that something is not working right, but it's hard to know what.
Reply all
Reply to author
Forward
0 new messages