Scheduler errors: mpiexec problem

833 views
Skip to first unread message

蔡宜靜

unread,
Dec 12, 2021, 3:01:00 PM12/12/21
to aiidausers
Hi I'm Yi-Ching Tsai, 
When I try to run a PWcalculation, 
I encounter some error messages that I couldn't solve by myself, 
including scheduler errors and log messages.
Here's the process(PK=2364) report contents:

*** 2364 [PW test]: None
*** Scheduler output:

:: initializing oneAPI environment ...
   _aiidasubmit.sh: BASH_VERSION = 4.2.46(2)-release
:: advisor -- latest
:: ccl -- latest
:: clck -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: inspector -- latest
:: ipp -- latest
:: ippcp -- latest
:: ipp -- latest
:: itac -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vpl -- latest
:: vtune -- latest
:: oneAPI environment initialized ::


*** Scheduler errors:
[mpiexec@node0] wait_proxies_to_terminate (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:541): downstream from host node0 was killed by signal 11 (Segmentation fault)
[mpiexec@node0] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:2125): assert (pg->intel.exitcodes != NULL) failed
[mpiexec@node0] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:360): write error (Bad file descriptor)

*** 4 LOG MESSAGES:
+-> WARNING at 2021-12-12 12:04:08.397769+00:00
 | key 'symmetries' is not present in raw output dictionary
+-> ERROR at 2021-12-12 12:04:08.455445+00:00
 | ERROR_OUTPUT_STDOUT_INCOMPLETE
+-> ERROR at 2021-12-12 12:04:08.459551+00:00
 | Both the stdout and XML output files could not be read or parsed.
+-> WARNING at 2021-12-12 12:04:08.461403+00:00
 | output parser returned exit code<305>: Both the stdout and XML output files could not be read or parsed.


Thank you for reading this,
Have a good day!

Yi-Ching, Tsai
2021/12/12

Sebastiaan Huber

unread,
Dec 12, 2021, 3:13:02 PM12/12/21
to aiida...@googlegroups.com
Dear Yi-Ching Tsai,

The problem seems to be with the executable that you are strying to run on the remote computer.
As you can see from the scheduler errors, the host node was killed because a segmentation fault occurred.
This has probably nothing to do with AiiDA.
Please first make sure that you can properly run the executable on the remote machine.
You can use the submit script that AiiDA generated as a starting point and launch it manually to debug what is going wrong.
You can get it by using `verdi calcjob inputcat <PK> _aiidasubmit.sh` where you replace `<PK>` with the pk of the calcjob.

HTH,

SPH
--
AiiDA is supported by the NCCR MARVEL (http://nccr-marvel.ch/), funded by the Swiss National Science Foundation, and by the European H2020 MaX Centre of Excellence (http://www.max-centre.eu/).
 
Before posting your first question, please see the posting guidelines at http://www.aiida.net/?page_id=356 .
---
You received this message because you are subscribed to the Google Groups "aiidausers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aiidausers+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aiidausers/7ebdae03-7cd1-4b0f-a8bc-b23c00a588f5n%40googlegroups.com.

蔡宜靜

unread,
Dec 13, 2021, 3:11:43 AM12/13/21
to aiidausers
Dear all, 

I've solve this problem by using the method that sebastia suggested.
Thank you so much!

Through 'verdi calcjob inputcat <PK> _aiidasubmit.sh', I found that it's my mpirun didn't work.

So I take two actions:
1. setup  the computer without 'mpirun' options.
2. setup  the computer with right 'mpirun' directory.

Both can run the AiiDA smoothly.

Best regards,
Yi-Ching Tsai
2021/12/13

sebastia...@epfl.ch 在 2021年12月13日 星期一上午4:13:02 [UTC+8] 的信中寫道:
Reply all
Reply to author
Forward
0 new messages