Sudden problems with MPICH2 and fds5_mpi

352 views
Skip to first unread message

Barbro Maria Storm

unread,
Jan 21, 2011, 3:42:18 AM1/21/11
to fds...@googlegroups.com
Hi all, I'm having some trouble with FDS with MPI and MPICH2. I keep
getting this error 123, and I can't figure out what it is. I've
googled it to no avail. The file runs fine with the latest FDS5
(64-bit for windows), but won't run with MPI. Has anyone encountered
the same, and know if it's a problem with MPICH2, FDS or something
else? (I'm only running the file locally on 3 out of 8 cores).


"C:\folder>"c:\Program Files\MPICH2\bin\mpiexec.exe" -n 3 "c:\Program
Files\FDS\FDS5\bin\fds5_mpi_win_64.exe" fds-file.fds
Process 2 of 2 is running on pc
Process 1 of 2 is running on pc
Process 0 of 2 is running on pc
Mesh 1 is assigned to Process 0
Mesh 2 is assigned to Process 1
Mesh 3 is assigned to Process 1
Mesh 4 is assigned to Process 2

Fire Dynamics Simulator

Compilation Date : Fri, 29 Oct 2010

Version: 5.5.3; MPI Enabled; OpenMP Disabled
SVN Revision No. : 7031

Job TITLE :
Job ID string : fds-file

Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
PMPI_Gatherv(376).....: MPI_Gatherv failed(sbuf=000000003E03F268, scount=1, MPI_
DOUBLE_PRECISION, rbuf=000000003E03F268, rcnts=000000003BCBB5B8, displs=00000000
3BCBB678, MPI_DOUBLE_PRECISION, root=0, MPI_COMM_WORLD) failed
MPIR_Gatherv_impl(189):
MPIR_Gatherv(102).....:
MPIR_Localcopy(349)...: memcpy arguments alias each other, dst=000000003E03F268
src=000000003E03F268 len=8

job aborted:
rank: node: exit code[: error message]
0: pc: 1: process 0 exited without calling finalize
1: pc: 123
2: pc: 123"

--
Barbro Storm

Jouni

unread,
Jan 21, 2011, 4:59:00 AM1/21/11
to FDS and Smokeview Discussions
After installing the latest MPICH2 version on a new computer I
encountered something similar, not exactly sure if it was that exact
error. Another thread here suggested trying an earlier version of
MPICH which worked for me (several earlier versions are available for
download at the MPICH2 website, I think I took the most recent before
the latest update).

-Jouni

Barbro Maria Storm

unread,
Jan 21, 2011, 5:01:57 AM1/21/11
to fds...@googlegroups.com
2011/1/21 Jouni <jouni....@gmail.com>:

Thanks, problem solved with reverting to MPICH2 v1.2.


--
Barbro Storm

Paul Hart

unread,
Jan 25, 2011, 2:09:22 PM1/25/11
to FDS and Smokeview Discussions
Great feedback. I also ran into the same issue this week when trying
to implement MPICH2-1.3.1 and the new default process manager, Hydra
(on a linux cluster). This is the latest version of MPICH. By default
it is using hydra as the process manager versus mpd which was the
default in previous versions. I suspect this may have something to do
with it.

ANL indicates "we recommend using the hydra process manager instead of
mpd. The mpd process manager has many problems, as well as an annoying
mpdboot step that is fragile and difficult to use correctly. The mpd
process manager is deprecated at this point, and most reported bugs in
it will not be fixed." My understanding is hydra avoids some of the
networking issues/errors encountered when using mpd (one reason I am
trying to move to hydra). I plan on sending ANL the error to get their
input.



On Jan 21, 4:01 am, Barbro Maria Storm <bar...@gmail.com> wrote:
> 2011/1/21 Jouni <jouni.nev...@gmail.com>:

Kevin

unread,
Jan 26, 2011, 11:22:55 AM1/26/11
to FDS and Smokeview Discussions
Paul -- are mpd and hydra part of the MPICH distribution, or are these
generic names for routines that handle multiple jobs under Windows or
linux OS? I'm not familiar with either.
> > Barbro Storm- Hide quoted text -
>
> - Show quoted text -

Paul Hart

unread,
Jan 26, 2011, 1:38:43 PM1/26/11
to FDS and Smokeview Discussions
Both are part of the MPICH distro. Before 2-1.3.1 the default was mpd.
So, chances are you are still using mpd (if you use the mpdboot
command you are using mpd). Before 2-1.3.1 the user can decide which
they want to use when they install MPICH, mpd loads by default, hydra
will load if the appropriate switch is used. I want to move to hydra
to avoid networking problems I have encountered.

I sent the error above to ANL, their response is below. It is beyond
me but may make sense to you.

I am going to re-install MPICH2-1.3.1 but with mpd instead of hydra to
see if I get the same error.



On Jan 26, 2011, at 7:44 AM CST, Paul Hart wrote:

> I have been using mpd as the process manager. I would like to change to hydra since mpd is being deprecated. I compiled MPICH2-1.3.1 and was able to run the cpi example program. I then attempted to run another program and receive the following error (ran in verbose mode for more info). I am able to run the same program using mpd.
>
> To my knowledge no one in the community that uses this program (Fire Dynamics Simulator, open source CFD tailored to fire, produced by community lead by the National Institute of Standards and Technology) has attempted to use hydra. They are still running on mpd.
[...]
> Fatal error in PMPI_Gatherv: Internal MPI error!, error stack:
> PMPI_Gatherv(376).....: MPI_Gatherv failed(sbuf=0x27a6c40, scount=1, MPI_DOUBLE_PRECISION, rbuf=0x27a6c40, rcnts=0x25b9670, displs=0x25b96f0, MPI_DOUBLE_PRECISION, root=0, MPI_COMM_WORLD) failed
> MPIR_Gatherv_impl(189):
> MPIR_Gatherv(102).....:
> MPIR_Localcopy(346)...: memcpy arguments alias each other, dst=0x27a6c40 src=0x27a6c40 len=8
> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)

It doesn't look like the problem has anything to do with hydra. Your
program is passing dst==src to MPI_Gatherv, but MPI does not permit
the send and recv buffers to alias each other. We usually try to
check for these sorts of things at a high level, but occasionally we
miss the upper level check and this lower level check in
MPIR_Localcopy triggers instead. This check was added sometime after
1.2, IIRC, so you hit it because you upgraded not because of hydra.

The correct fix is to pass MPI_IN_PLACE as the value of sendbuf at the
root process.

I'll put an error check in MPI_Gatherv in order to make the error a
bit easier to understand.

-Dave
> > - Show quoted text -- Hide quoted text -

Kevin

unread,
Jan 26, 2011, 1:49:36 PM1/26/11
to FDS and Smokeview Discussions
This looks like something I can fix. I am vaguely familiar with this
MPI_IN_PLACE stuff. This would effect lines of FDS code, so I will fix
in FDS 6. So the moral of this story is to stick with the older
version of MPICH until we release FDS 6.
Reply all
Reply to author
Forward
0 new messages