Configuration: gcc-openmpi
Running on host n0035.savio2
Time: Wed Jan 3 12:42:19 PST 2018
Directory: /global/home/users/miguelr/python/test/test-fenics-2017.2.0
Using 30 processors across 2 nodes
Memory: 63500 MB per node
[n0035:16098] *** Process received signal ***
[n0035:16098] Signal: Segmentation fault (11)
[n0035:16098] Signal code: Address not mapped (1)
[n0035:16098] Failing at address: 0x3287190
[n0036:14278] *** Process received signal ***
[n0036:14278] Signal: Segmentation fault (11)
[n0036:14278] Signal code: Address not mapped (1)
[n0036:14278] Failing at address: 0x24ba180
[n0036:14278] [ 0] /usr/lib64/libpthread.so.0(+0xf5e0)[0x2b34712b45e0]
[n0036:14278] [ 1] /global/software/sl-7.x86_64/modules/langs/python/3.5/lib/python3.5/lib-dynload/_posixsubprocess.so(+0x2045)[0x2b347a575045]
[n0036:14278] [ 2] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyCFunction_Call+0xf9)[0x2b3470e715e9]
[n0036:14278] [ 3] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x8fb5)[0x2b3470ef8bd5]
[n0036:14278] [ 4] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x2b3470ef9b49]
[n0036:14278] [ 5] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x2b3470ef8df5]
[n0036:14278] [ 6] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x2b3470ef9b49]
[n0036:14278] [ 7] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalCodeEx+0x48)[0x2b3470ef9cd8]
[n0036:14278] [ 8] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0x9a661)[0x2b3470e4f661]
[n0036:14278] [ 9] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyObject_Call+0x56)[0x2b3470e1c236]
[n0036:14278] [10] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0x8377c)[0x2b3470e3877c]
[n0036:14278] [11] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyObject_Call+0x56)[0x2b3470e1c236]
[n0036:14278] [12] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0xd84c3)[0x2b3470e8d4c3]
[n0036:14278] [13] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0xcedaf)[0x2b3470e83daf]
[n0036:14278] [14] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyObject_Call+0x56)[0x2b3470e1c236]
[n0036:14278] [15] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x66f4)[0x2b3470ef6314]
[n0036:14278] [16] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x9546)[0x2b3470ef9166]
[n0036:14278] [17] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x9546)[0x2b3470ef9166]
[n0036:14278] [18] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x2b3470ef9b49]
[n0036:14278] [19] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalCodeEx+0x48)[0x2b3470ef9cd8]
[n0036:14278] [20] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalCode+0x3b)[0x2b3470ef9d1b]
[n0036:14278] [21] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0x137dfe)[0x2b3470eecdfe]
[n0036:14278] [22] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyCFunction_Call+0xf9)[0x2b3470e715e9]
[n0036:14278] [23] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x6ba0)[0x2b3470ef67c0]
[n0036:14278] [24] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x2b3470ef9b49]
[n0036:14278] [25] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x91d5)[0x2b3470ef8df5]
[n0036:14278] [26] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x9546)[0x2b3470ef9166]
[n0036:14278] [27] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x9546)[0x2b3470ef9166]
[n0036:14278] [28] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(PyEval_EvalFrameEx+0x9546)[0x2b3470ef9166]
[n0036:14278] [29] /global/software/sl-7.x86_64/modules/langs/python/3.5/bin/../lib/libpython3.5m.so.1.0(+0x144b49)[0x2b3470ef9b49]
[n0036:14278] *** End of error message ***
[n0036:14301] *** Process received signal ***
[n0036:14301] Signal: Segmentation fault (11)
[n0036:14301] Signal code: Address not mapped (1)
[n0036:14301] Failing at address: 0x33e3500
[n0036:14302] *** Process received signal ***
[n0036:14302] Signal: Segmentation fault (11)
[n0036:14302] Signal code: Address not mapped (1)
[n0036:14302] Failing at address: 0x3fedc00
[n0036:14322] *** Process received signal ***
[n0036:14322] Signal: Segmentation fault (11)
[n0036:14322] Signal code: Address not mapped (1)
[n0036:14322] Failing at address: 0x35cd1d0
Process 0: Solving linear variational problem.
Process 9: Solving linear variational problem.
Process 13: Solving linear variational problem.
Process 7: Solving linear variational problem.
Process 12: Solving linear variational problem.
Process 14: Solving linear variational problem.
Process 1: Solving linear variational problem.
Process 11: Solving linear variational problem.
Process 3: Solving linear variational problem.
Process 5: Solving linear variational problem.
Process 2: Solving linear variational problem.
Process 10: Solving linear variational problem.
Process 6: Solving linear variational problem.
Process 4: Solving linear variational problem.
Process 8: Solving linear variational problem.
Process 16: Solving linear variational problem.
Process 20: Solving linear variational problem.
Process 26: Solving linear variational problem.
Process 19: Solving linear variational problem.
Process 21: Solving linear variational problem.
Process 24: Solving linear variational problem.
Process 25: Solving linear variational problem.
Process 28: Solving linear variational problem.
Process 27: Solving linear variational problem.
Process 22: Solving linear variational problem.
Process 17: Solving linear variational problem.
Process 15: Solving linear variational problem.
Process 18: Solving linear variational problem.
Process 23: Solving linear variational problem.
Process 29: Solving linear variational problem.
real 0m8.888s
user 1m19.635s
sys 0m12.147s
The results seem to be identical in both cases. On the other hand, if I submit this job across 2 nodes with a clean instant and dijitso cache, the job fails as shown below (25 processes):
Configuration: gcc-openmpi
Running on host n0035.savio2
Time: Wed Jan 3 12:53:20 PST 2018
Directory: /global/home/users/miguelr/python/test/test-fenics-2017.2.0
Using 25 processors across 2 nodes
Memory: 63500 MB per node
Calling FFC just-in-time (JIT) compiler, this may take some time.
------------------- Start compiler output ------------------------
[n0035:16463] *** Process received signal ***
[n0035:16463] Signal: Segmentation fault (11)
[n0035:16463] Signal code: Address not mapped (1)
[n0035:16463] Failing at address: 0x412e2d0
------------------- End compiler output ------------------------
Compilation failed! Sources, command, and errors have been written to: /global/home/users/miguelr/python/test/test-fenics-2017.2.0/jitfailure-ffc_element_5b6081f30aebcbadbc21c15812333d05758ea45f
Traceback (most recent call last):
File "test-linear_solver.py", line 11, in <module>
V = dlf.FunctionSpace(mesh, "CG", 2)
File "/global/home/groups/fc_biome/sl7/programs/gcc-programs/fenics-2017.2.0/lib/python3.5/site-packages/dolfin/functions/functionspace.py", line 199, in __init__
self._init_convenience(*args, **kwargs)
File "/global/home/groups/fc_biome/sl7/programs/gcc-programs/fenics-2017.2.0/lib/python3.5/site-packages/dolfin/functions/functionspace.py", line 249, in _init_convenience
constrained_domain=constrained_domain)
File "/global/home/groups/fc_biome/sl7/programs/gcc-programs/fenics-2017.2.0/lib/python3.5/site-packages/dolfin/functions/functionspace.py", line 218, in _init_from_ufl
dolfin_element, dolfin_dofmap = _compile_dolfin_element(element, mesh, constrained_domain=constrained_domain)
File "/global/home/groups/fc_biome/sl7/programs/gcc-programs/fenics-2017.2.0/lib/python3.5/site-packages/dolfin/functions/functionspace.py", line 82, in _compile_dolfin_element
ufc_element, ufc_dofmap = jit(element, mpi_comm=mesh.mpi_comm())
File "/global/home/groups/fc_biome/sl7/programs/gcc-programs/fenics-2017.2.0/lib/python3.5/site-packages/dolfin/compilemodules/jit.py", line 107, in mpi_jit
error_msg)
File "/global/home/groups/fc_biome/sl7/programs/gcc-programs/fenics-2017.2.0/lib/python3.5/site-packages/dolfin/cpp/common.py", line 2739, in dolfin_error
return _common.dolfin_error(location, task, reason)
RuntimeError:
*** -------------------------------------------------------------------------
*** DOLFIN encountered an error. If you are not able to resolve this issue
*** using the information listed below, you can ask for help at
***
***
*** Remember to include the error message listed below and, if possible,
*** include a *minimal* running example to reproduce the error.
***
*** -------------------------------------------------------------------------
*** Error: Unable to perform just-in-time compilation of form.
*** Reason: Compilation failed on root node..
*** Where: This error was encountered inside jit.py.
*** Process: 18
***
*** DOLFIN version: 2017.2.0
*** Git changeset: 0baf73825079a581e43ab1705370043040aa213d
*** -------------------------------------------------------------------------
[OMITTED REPEATED LINES]
--------------------------------------------------------------------------
mpirun noticed that process rank 21 with PID 14564 on node n0036.savio2 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
real 0m6.524s
user 0m45.460s
sys 0m7.269s
The directory jitfailure-ffc_element_5b6081f30aebcbadbc21c15812333d05758ea45f is generated with the following error log:
and when I run recompile.sh from the command line, no errors occur.
I've attached the python script used, the job file used to submit the job (SLURM), the output log files for all three cases, and the jitfailure* directory mentioned above.