MPI on OSX using multiple computers

560 views
Skip to first unread message

Dave Sheppard

unread,
Jan 28, 2015, 5:07:58 PM1/28/15
to fds...@googlegroups.com

Has anyone gottenFDS I to work on OSX between multiple computers. I downloaded the FDS and MPI files and followed the set up instructions. I can run FDS MPI runs on a single computer using the Command "mpirun -np 4 fds_mpi models.fds".

When I try to run a model on multiple computers I get an error message. Does anyone have any insight on how to troubleshoot? I already set up public keys between the computers so that I can ssh between the machines without needing to input a password.

Kevin

unread,
Jan 28, 2015, 5:11:28 PM1/28/15
to fds...@googlegroups.com
We've not done it here at NIST. 

Dave Sheppard

unread,
Jan 28, 2015, 6:08:24 PM1/28/15
to fds...@googlegroups.com
Who can I talk to?

Lukas A.

unread,
Jan 29, 2015, 5:14:30 AM1/29/15
to fds...@googlegroups.com
Dave,

neither do we run such configurations, nor do I know anyone to point you to. The reason is, that in general it is way cheaper and more effective to run a Linux cluster with "commodity" hardware.

Best,
Lukas


On 29 Jan 2015, at 00:08, Dave Sheppard <drdtsh...@gmail.com> wrote:

> Who can I talk to?
>
> --
> You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
> To post to this group, send email to fds...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/3996829e-c828-4c99-953d-fd04ffc48218%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Dave Sheppard

unread,
Jan 29, 2015, 6:45:35 AM1/29/15
to fds...@googlegroups.com
We have a bunch of brand new Mac Pro computers because we use an apple cluster to compress the HiDef videos from our experiments. There is a lot of unused computer time on these Mac's. They are very busy compressing after each experiment, but then they sit idle until the next video recording ends. These Mac Pros have truly impressive specs and it would be a shame to let all of that computational power go to waste.

Can anyone send me an example of the host file that you use with fds? The error that I am receiving says that it can't start the orde process on the remote computer. When I ssh to the remote computer, I can start the programs, so I am pretty sure that I have the environment variables correct.

Glenn Forney

unread,
Jan 29, 2015, 7:48:50 AM1/29/15
to fds...@googlegroups.com
is the directory where your case is located visible in the same place on each computer where you are trying to run your jobs?   ie does /home/drdtsheppard (if that is your home directory) show the same files (not just copies).  In addition to setting up ssh keys, on our linux cluster, we cross mount the file system containing home directories so that it is visible on each of our compute nodes


On Thu, Jan 29, 2015 at 6:45 AM, Dave Sheppard <drdtsh...@gmail.com> wrote:
We have a bunch of brand new Mac Pro computers because we use an apple cluster to compress the HiDef videos from our experiments.  There is a lot of unused computer time on these Mac's.   They are very busy compressing after each experiment, but then they sit idle until the next video recording ends.  These Mac Pros have truly impressive specs and it would be a shame to let all of that computational power go to waste.

Can anyone send me an example of the host file that you use with fds? The error that I am receiving says that it can't start the orde process on the remote computer.  When I ssh to the remote computer, I can start the programs, so I am pretty sure that I have the environment variables correct.
--
You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
To post to this group, send email to fds...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Glenn Forney

Kevin

unread,
Jan 29, 2015, 10:52:16 AM1/29/15
to fds...@googlegroups.com
Well, this is a pretty disappointing showing for our Mac enthusiasts. Come on, guys and gals, demonstrate the technical superiority of your chosen computing platform.

Nothing like a public shaming to spur a little activity in this area. I certainly spent enough time getting MPI to work under Windows. Let's see how easy it is on a Mac.

On Thursday, January 29, 2015 at 7:48:50 AM UTC-5, GF wrote:
is the directory where your case is located visible in the same place on each computer where you are trying to run your jobs?   ie does /home/drdtsheppard (if that is your home directory) show the same files (not just copies).  In addition to setting up ssh keys, on our linux cluster, we cross mount the file system containing home directories so that it is visible on each of our compute nodes

On Thu, Jan 29, 2015 at 6:45 AM, Dave Sheppard <drdtsh...@gmail.com> wrote:
We have a bunch of brand new Mac Pro computers because we use an apple cluster to compress the HiDef videos from our experiments.  There is a lot of unused computer time on these Mac's.   They are very busy compressing after each experiment, but then they sit idle until the next video recording ends.  These Mac Pros have truly impressive specs and it would be a shame to let all of that computational power go to waste.

Can anyone send me an example of the host file that you use with fds? The error that I am receiving says that it can't start the orde process on the remote computer.  When I ssh to the remote computer, I can start the programs, so I am pretty sure that I have the environment variables correct.

--
You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+unsubscribe@googlegroups.com.

To post to this group, send email to fds...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/21e8009e-f331-4974-9532-6357d3bb27c8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Glenn Forney

Lukas A.

unread,
Jan 29, 2015, 11:02:42 AM1/29/15
to fds...@googlegroups.com
@Kevin: mhh, this is not fair ;-)

@OP: could you please execute the "hostname" command in parallel, i.e.

mpiexec -np 4 hostname

What is the output / error message?

Best,
Lukas

On 29 Jan 2015, at 16:52, Kevin <mcgr...@gmail.com> wrote:

> Well, this is a pretty disappointing showing for our Mac enthusiasts. Come on, guys and gals, demonstrate the technical superiority of your chosen computing platform.
>
> Nothing like a public shaming to spur a little activity in this area. I certainly spent enough time getting MPI to work under Windows. Let's see how easy it is on a Mac.
>
> On Thursday, January 29, 2015 at 7:48:50 AM UTC-5, GF wrote:
> is the directory where your case is located visible in the same place on each computer where you are trying to run your jobs? ie does /home/drdtsheppard (if that is your home directory) show the same files (not just copies). In addition to setting up ssh keys, on our linux cluster, we cross mount the file system containing home directories so that it is visible on each of our compute nodes
>
>
> On Thu, Jan 29, 2015 at 6:45 AM, Dave Sheppard <drdtsh...@gmail.com> wrote:
> We have a bunch of brand new Mac Pro computers because we use an apple cluster to compress the HiDef videos from our experiments. There is a lot of unused computer time on these Mac's. They are very busy compressing after each experiment, but then they sit idle until the next video recording ends. These Mac Pros have truly impressive specs and it would be a shame to let all of that computational power go to waste.
>
> Can anyone send me an example of the host file that you use with fds? The error that I am receiving says that it can't start the orde process on the remote computer. When I ssh to the remote computer, I can start the programs, so I am pretty sure that I have the environment variables correct.
>
> --
> You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
> To post to this group, send email to fds...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/21e8009e-f331-4974-9532-6357d3bb27c8%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Glenn Forney
>
> --
> You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
> To post to this group, send email to fds...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/6d57fe8f-0756-4aea-b5bb-b5f4e0d67142%40googlegroups.com.

Dave Sheppard

unread,
Jan 30, 2015, 6:28:22 AM1/30/15
to fds...@googlegroups.com
I followed Glenn's suggestion and re-installed the MPI executables and the FDS application files into the User home directory.  I also put the FDS input files into the identical location on both computers.  The computers have the same user account name 'cluster' so the directory structure to both home directories is identical '/users/cluster'.  I performed a printenv on both computers and the paths and environment variables are identical.

When I run mpi on the local computer everything is fine.  I use the following command line to run locally "mpirun -np 4 fds_mpi gasfill.data".  This command line works on both computers.

As Lukas suggested I inputted "mpiexec -np 4 clu...@10.243.200.102", where clu...@10.243.200.102 is the hostname that I use to ssh to the remote computer.  The following is the results.
 

FRLCLSTR00:fds cluster$ mpiexec -np 4 clu...@10.243.200.102

--------------------------------------------------------------------------

mpiexec was unable to find the specified executable file, and therefore

did not launch the job.  This error was first reported for process

rank 0; it may have occurred for other processes as well.

 

NOTE: A common cause for this error is misspelling a mpiexec command

      line parameter option (remember that mpiexec interprets the first

      unrecognized command line token as the executable).

 

Node:       FRLCLSTR00

Executable: clu...@10.243.200.102

--------------------------------------------------------------------------

4 total processes failed to start

Lukas A.

unread,
Jan 30, 2015, 6:52:02 AM1/30/15
to fds...@googlegroups.com
Dave,

by "hostname" I mean the Linux command, not the name of the host.

Best,
Lukas
> --
> You received this message because you are subscribed to the Google Groups "FDS and Smokeview Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to fds-smv+u...@googlegroups.com.
> To post to this group, send email to fds...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/fds-smv/d07d8b76-a172-4657-87dd-e3a70ad45bec%40googlegroups.com.

Dave Sheppard

unread,
Jan 30, 2015, 10:37:27 AM1/30/15
to fds...@googlegroups.com
When I input the command:

mpiexec -np 4 hostname

I get the command prompt back with no messages.


Andrew Louie

unread,
Jan 30, 2015, 11:49:33 AM1/30/15
to fds...@googlegroups.com
maybe try: mpiexec -np 4 -host clu...@10.243.200.102 fds_mpi models.fds

the -host commandline argument tells mpiexec which host it should run
the program on.

don't you need to have the mpi daemon running on your client machines?
> --
> You received this message because you are subscribed to the Google Groups
> "FDS and Smokeview Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to fds-smv+u...@googlegroups.com.
> To post to this group, send email to fds...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/fds-smv/4408a039-0d35-47cf-8150-f1b8f7ca0641%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
-Andrew Louie :wq

Dave Sheppard

unread,
Jan 30, 2015, 12:45:01 PM1/30/15
to fds...@googlegroups.com

I have tried several different ways.  The following command worked.  The other three didn't My command is bolded.  The not bolded is the response from the computer.  Thanks in advance for any help.


FRLCLSTR00:fds cluster$ mpiexec -np 4 -host FRLCLSTR00 fds_mpi gasfill.data



FRLCLSTR00:fds cluster$ mpiexec -np 4 FRLCLSTR00

--------------------------------------------------------------------------

mpiexec was unable to find the specified executable file, and therefore

did not launch the job.  This error was first reported for process

rank 0; it may have occurred for other processes as well.

 

NOTE: A common cause for this error is misspelling a mpiexec command

      line parameter option (remember that mpiexec interprets the first

      unrecognized command line token as the executable).

 

Node:       FRLCLSTR00

Executable: FRLCLSTR00

--------------------------------------------------------------------------

4 total processes failed to start


FRLCLSTR00:fds cluster$ mpiexec -np 4 -host FRLCLSTR02 fds_mpi gasfill.data

 

ssh: Could not resolve hostname FRLCLSTR02: nodename nor servname provided, or not known

--------------------------------------------------------------------------

ORTE was unable to reliably start one or more daemons.

This usually is caused by:

 

* not finding the required libraries and/or binaries on

  one or more nodes. Please check your PATH and LD_LIBRARY_PATH

  settings, or configure OMPI with --enable-orterun-prefix-by-default

 

* lack of authority to execute on one or more specified nodes.

  Please verify your allocation and authorities.

 

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).

  Please check with your sys admin to determine the correct location to use.

 

*  compilation of the orted with dynamic libraries when static are required

  (e.g., on Cray). Please check your configure cmd line and consider using

  one of the contrib/platform definitions for your system type.

 

* an inability to create a connection back to mpirun due to a

  lack of common network interfaces and/or no route found between

  them. Please check network connectivity (including firewalls

  and network routing requirements).

--------------------------------------------------------------------------

Kevin

unread,
Jan 30, 2015, 12:48:24 PM1/30/15
to fds...@googlegroups.com
Does this mean that you have successfully run 4 MPI processes on a single computer?

Dave Sheppard

unread,
Jan 30, 2015, 1:11:47 PM1/30/15
to fds...@googlegroups.com
I currently have MPI and FDS loaded on three identical Apple Mac Pros.  I can run MPI FDS on each computer individually.  I cannot get an MPI run to start using remote computers.

Andrew Louie

unread,
Jan 30, 2015, 1:13:00 PM1/30/15
to fds...@googlegroups.com
this may help: http://www.open-mpi.org/~jsquyres/www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems
> --
> You received this message because you are subscribed to the Google Groups
> "FDS and Smokeview Discussions" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to fds-smv+u...@googlegroups.com.
> To post to this group, send email to fds...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/fds-smv/f96473ff-76b3-4c1f-a59b-17235553b2b5%40googlegroups.com.

Dave Sheppard

unread,
Jan 30, 2015, 4:25:35 PM1/30/15
to fds...@googlegroups.com
I am one step closer.  I've gotten mpirun/fds_mpi to run on one remote machine, although it now hangs when I try to have more than one remote host.

The problem is that in the bash shell there is an interactive mode and a non-interactive mode and they use different profiles to set the paths.  When you ssh to the remote computer it is interactive and the profile file set up by the fds installation is used.  When mpirun runs, it is non-interactive mode and a different path is used, which I haven't figured out how to edit.  Luckily, mpirun has a switch '--prefix' which allows you to manually set the home directory for the mpi files.

  

Susanne Kilian

unread,
Jan 31, 2015, 4:46:00 AM1/31/15
to fds...@googlegroups.com

Hello Dave,


although I also use FDS on a Mac Pro, I never coupled multiple Macs together.

But on our Windows cluster we had similar problems. As already discussed above, it was necessary to explicitly specify a file with the given hosts by 'mpirun -hostfile name_of_hostfile'.


But additionally, we also had to specify the working directory by adding '-wdir path_to_working_dir'. Did you already check this? As you wrote the home-directory structure on your single cluster nodes identical, so hopefully this should work. Or have you already been able to resolve the problem with the --prefix command?


As already mentioned in louiea's thread, a very good reference for questions related to mpi is:

http://www.open-mpi.org/faq/?category=running


Best Susan


Dave Sheppard

unread,
Feb 1, 2015, 6:19:53 AM2/1/15
to fds...@googlegroups.com
Thank you everyone.  I have FDS running on multiple remote Mac Pro computers.  Everyone's suggestions were greatly appreciated.

I will write up instructions on how I got it to work and ask the FDS developers to post them.     


Reply all
Reply to author
Forward
0 new messages