fata error in pmpi_waitwall:See the MPI_ERROR field in MPI_Status for the error code

53 views
Skip to first unread message

Tom Li

unread,
Mar 29, 2023, 9:22:40 AM3/29/23
to deal.II User Group
Hi everyoneI am using matrix-free technology to solve a phase field problem.When I compile my program everythiong went well, but something went wrong when the program runs for a long time. It seems that this error has some relationships with MPI;And this is the information about the error
99da5909a9d5d034ccf6f70a943ab87.jpg
the attachment is details about dealii! thanks a lot!
detailed.log

Wolfgang Bangerth

unread,
Mar 31, 2023, 12:44:06 PM3/31/23
to dea...@googlegroups.com
On 3/29/23 07:02, Tom Li wrote:
> I am using matrix-free technology to solve a phase field problem.When I
> compile my program everythiong went well, but something went wrong when the
> program runs for a long time. It seems that this error has some relationships
> with MPI;

Tom:
it's impossible to say from just the error message. It could be that your job
ran out of time in the queue you submitted it to. It could be a hardware
failure. It could be a software bug.

The first step is to find out whether the problem is reproducible. If you run
the exact same job, does it error out in the exact same place?

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/


Tom Li

unread,
Apr 2, 2023, 12:01:24 AM4/2/23
to deal.II User Group
Dear  Wolfgang:
       Thanks for your replying.I have tried some times to run my code on the HPC. It seems that every application runs for a long time, this error(fatal error in PMPI_Waitwall:See the MPI_ERROR field in MPI_Status for the error code ) wil happen near "setup matrix-free (CPU/wall)" in the file dealii-main.cc line 420.I think the MPI function MPI_BARRIER may help. But I am confused that
how to use this function in my code. And this is the code How I realize my PDEs in phase field of solidification. Another question is that when I try to run my code in dimension3, it seems that this code run much slower than dimension2. I didn‘ change other things just chang the dimension, is there something went wrong 
during this code.Thanks a lot!
Best Wishes!
CMakeLists.txt
parameters.h
dealii-main.cc
phasefield.h
solutefield.h

Wolfgang Bangerth

unread,
Apr 2, 2023, 7:14:29 PM4/2/23
to dea...@googlegroups.com
On 4/1/23 22:01, Tom Li wrote:
>      Thanks for your replying.I have tried some times to run my code on the
> HPC. It seems that every application runs for a long time, this error(fatal
> error in PMPI_Waitwall:See the MPI_ERROR field in MPI_Status for the error
> code ) wil happen near "setup matrix-free (CPU/wall)" in the file
> dealii-main.cc line 420.I think the MPI function MPI_BARRIER may help. But I
> am confused that
> how to use this function in my code.

Like I said, it would be useful to know whether the program fails in the same
place every time?


> And this is the code How I realize my
> PDEs in phase field of solidification. Another question is that when I try to
> run my code in dimension3, it seems that this code run much slower than
> dimension2. I didn‘ change other things just chang the dimension, is there
> something went wrong
> during this code.

That makes perfect sense to me. If you think about using a program in 2d that
uses a 100x100 mesh, then that is not so much work to solve the resulting
matrix or size 10,000. If you just run it in 3d, the mesh is now 100x100x100
and the matrix has size 1,000,000. That's clearly going to take a lot longer.
Reply all
Reply to author
Forward
0 new messages