Hi,
I received a segmentation fault error when running raxmlHPC-MPI-SSE3 v8.1.3 for two of my alignments. The first one had 1407 taxa with 1590 patterns, and the second one had 1044 taxa with 1577 distinct pattern. I contacted the IT desk for the supercomputer and they said they "could see nothing in the hardware logs that would indicate that this was caused by a hardware problem".
Both runs were able to generate the number of trees as requested, but got terminated before the best tree could be produced. I then had to manually generate the best tree afterward.
The command for the run is as follow:
$mpirun -n 128 /home/usr/bin/raxmlHPC-MPI-SSE3 -m GTRCATI -c 25 -e 0.1 -p 31415 -d -f d -N 200 -n outfile -s alignment.fas
The error for the 1st run:
[r103-n28:07641] *** Process received signal ***
[r103-n28:07641] Signal: Segmentation fault (11)
[r103-n28:07641] Signal code: Address not mapped (1)
[r103-n28:07641] Failing at address: 0x11
[r103-n28:07641] [ 0] /lib64/libpthread.so.0(+0xf710) [0x2b053f92c710]
[r103-n28:07641] [ 1] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x1f3) [0x2b053ddea1d3]
[r103-n28:07641] [ 2] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x69) [0x2b053dded9c9]
[r103-n28:07641] [ 3] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x6f) [0x2b053dded59f]
[r103-n28:07641] [ 4] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(mca_btl_openib_endpoint_connect_eager_rdma+0x4e) [0x2b053dc4662e]
[r103-n28:07641] [ 5] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(+0xdb62f) [0x2b053dc3d62f]
[r103-n28:07641] [ 6] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_progress+0xaa) [0x2b053de0b09a]
[r103-n28:07641] [ 7] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_request_default_wait+0xee) [0x2b053dbf789e]
[r103-n28:07641] [ 8] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_generic+0x3d3) [0x2b053dc66013]
[r103-n28:07641] [ 9] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_binomial+0xae) [0x2b053dc65c1e]
[r103-n28:07641] [10] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_dec_fixed+0xfd) [0x2b053dc5cead]
[r103-n28:07641] [11] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(mca_coll_sync_bcast+0x6f) [0x2b053dc6dbbf]
[r103-n28:07641] [12] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(MPI_Bcast+0x72) [0x2b053dc069a2]
[r103-n28:07641] [13] /home/usr/bin/raxmlHPC-MPI-SSE3(doInference+0x652) [0x43e302]
[r103-n28:07641] [14] /home/usr/bin/raxmlHPC-MPI-SSE3(main+0x15c6) [0x407276]
[r103-n28:07641] [15] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2b053fb58d5d]
[r103-n28:07641] [16] /home/user/bin/raxmlHPC-MPI-SSE3() [0x405bc9]
[r103-n28:07641] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 123 with PID 7641 on node r103-n28 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
The error for the 2nd run:
[r110-n64:11147] *** Process received signal ***
[r110-n64:11147] Signal: Segmentation fault (11)
[r110-n64:11147] Signal code: (128)
[r110-n64:11147] Failing at address: (nil)
[r110-n64:11148] *** Process received signal ***
[r110-n64:11148] Signal: Segmentation fault (11)
[r110-n64:11148] Signal code: (128)
[r110-n64:11148] Failing at address: (nil)
[r110-n64:11148] [ 0] /lib64/libpthread.so.0(+0xf710) [0x2ad2213c7710]
[r110-n64:11147] [ 0] /lib64/libpthread.so.0(+0xf710) [0x2b6b28ba9710]
[r110-n64:11147] [ 1] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x7ed) [0x2b6b270677cd]
[r110-n64:11147] [ 2] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x69) [0x2b6b2706a9c9]
[r110-n64:11147] [ 3] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x6f) [0x2b6b2706a59f]
[r110-n64:11147] [ 4] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_generic+0x93) [0x2b6b26ee2cd3]
[r110-n64:11147] [ 5] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_binomial+0xae) [0x2b6b26ee2c1e]
[r110-n64:11147] [ 6] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_dec_fixed+0xfd) [0x2b6b26ed9ead]
[r110-n64:11147] [ 7] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(mca_coll_sync_bcast+0x6f) [0x2b6b26eeabbf]
[r110-n64:11147] [ 8] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(MPI_Bcast+0x72) [0x2b6b26e839a2]
[r110-n64:11147] [ 9] /home/usr/bin/raxmlHPC-MPI-SSE3(doInference+0x652) [0x43e302]
[r110-n64:11147] [10] /home/usr/bin/raxmlHPC-MPI-SSE3(main+0x15c6) [0x407276]
[r110-n64:11147] [11] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2b6b28dd5d5d]
[r110-n64:11147] [12] /home/usr/bin/raxmlHPC-MPI-SSE3() [0x405bc9]
[r110-n64:11147] *** End of error message ***
[r110-n64:11148] [ 1] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x760) [0x2ad21f885740]
[r110-n64:11148] [ 2] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x69) [0x2ad21f8889c9]
[r110-n64:11148] [ 3] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x6f) [0x2ad21f88859f]
[r110-n64:11148] [ 4] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_generic+0x93) [0x2ad21f700cd3]
[r110-n64:11148] [ 5] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_binomial+0xae) [0x2ad21f700c1e]
[r110-n64:11148] [ 6] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_dec_fixed+0xfd) [0x2ad21f6f7ead]
[r110-n64:11148] [ 7] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(mca_coll_sync_bcast+0x6f) [0x2ad21f708bbf]
[r110-n64:11148] [ 8] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(MPI_Bcast+0x72) [0x2ad21f6a19a2]
[r110-n64:11148] [ 9] /home/usr/bin/raxmlHPC-MPI-SSE3(doInference+0x652) [0x43e302]
[r110-n64:11148] [10] /home/usr/bin/raxmlHPC-MPI-SSE3(main+0x15c6) [0x407276]
[r110-n64:11148] [11] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2ad2215f3d5d]
[r110-n64:11148] [12] /home/usr/bin/raxmlHPC-MPI-SSE3() [0x405bc9]
[r110-n64:11148] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 116 with PID 11148 on node r110-n64 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------