RAxML segmentation fault

352 views
Skip to first unread message

Pamela

unread,
Jul 24, 2015, 12:00:30 PM7/24/15
to raxml
Hi,
I received a segmentation fault error when running raxmlHPC-MPI-SSE3 v8.1.3 for two of my alignments. The first one had 1407 taxa with 1590 patterns, and the second one had 1044 taxa with 1577 distinct pattern. I contacted the IT desk for the supercomputer and they said they "could see nothing in the hardware logs that would indicate that this was caused by a hardware problem". 

Both runs were able to generate the number of trees as requested, but got terminated before the best tree could be produced. I then had to manually generate the best tree afterward. 

The command for the run is as follow:
     $mpirun -n 128 /home/usr/bin/raxmlHPC-MPI-SSE3 -m GTRCATI -c 25 -e 0.1 -p 31415 -d -f d -N 200 -n outfile -s alignment.fas

The error for the 1st run: 
[r103-n28:07641] *** Process received signal ***
[r103-n28:07641] Signal: Segmentation fault (11)
[r103-n28:07641] Signal code: Address not mapped (1)
[r103-n28:07641] Failing at address: 0x11
[r103-n28:07641] [ 0] /lib64/libpthread.so.0(+0xf710) [0x2b053f92c710]
[r103-n28:07641] [ 1] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x1f3) [0x2b053ddea1d3]
[r103-n28:07641] [ 2] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x69) [0x2b053dded9c9]
[r103-n28:07641] [ 3] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x6f) [0x2b053dded59f]
[r103-n28:07641] [ 4] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(mca_btl_openib_endpoint_connect_eager_rdma+0x4e) [0x2b053dc4662e]
[r103-n28:07641] [ 5] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(+0xdb62f) [0x2b053dc3d62f]
[r103-n28:07641] [ 6] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_progress+0xaa) [0x2b053de0b09a]
[r103-n28:07641] [ 7] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_request_default_wait+0xee) [0x2b053dbf789e]
[r103-n28:07641] [ 8] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_generic+0x3d3) [0x2b053dc66013]
[r103-n28:07641] [ 9] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_binomial+0xae) [0x2b053dc65c1e]
[r103-n28:07641] [10] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_dec_fixed+0xfd) [0x2b053dc5cead]
[r103-n28:07641] [11] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(mca_coll_sync_bcast+0x6f) [0x2b053dc6dbbf]
[r103-n28:07641] [12] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(MPI_Bcast+0x72) [0x2b053dc069a2]
[r103-n28:07641] [13] /home/usr/bin/raxmlHPC-MPI-SSE3(doInference+0x652) [0x43e302]
[r103-n28:07641] [14] /home/usr/bin/raxmlHPC-MPI-SSE3(main+0x15c6) [0x407276]
[r103-n28:07641] [15] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2b053fb58d5d]
[r103-n28:07641] [16] /home/user/bin/raxmlHPC-MPI-SSE3() [0x405bc9]
[r103-n28:07641] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 123 with PID 7641 on node r103-n28 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------



The error for the 2nd run:
[r110-n64:11147] *** Process received signal ***
[r110-n64:11147] Signal: Segmentation fault (11)
[r110-n64:11147] Signal code:  (128)
[r110-n64:11147] Failing at address: (nil)
[r110-n64:11148] *** Process received signal ***
[r110-n64:11148] Signal: Segmentation fault (11)
[r110-n64:11148] Signal code:  (128)
[r110-n64:11148] Failing at address: (nil)
[r110-n64:11148] [ 0] /lib64/libpthread.so.0(+0xf710) [0x2ad2213c7710]
[r110-n64:11147] [ 0] /lib64/libpthread.so.0(+0xf710) [0x2b6b28ba9710]
[r110-n64:11147] [ 1] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x7ed) [0x2b6b270677cd]
[r110-n64:11147] [ 2] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x69) [0x2b6b2706a9c9]
[r110-n64:11147] [ 3] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x6f) [0x2b6b2706a59f]
[r110-n64:11147] [ 4] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_generic+0x93) [0x2b6b26ee2cd3]
[r110-n64:11147] [ 5] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_binomial+0xae) [0x2b6b26ee2c1e]
[r110-n64:11147] [ 6] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_dec_fixed+0xfd) [0x2b6b26ed9ead]
[r110-n64:11147] [ 7] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(mca_coll_sync_bcast+0x6f) [0x2b6b26eeabbf]
[r110-n64:11147] [ 8] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(MPI_Bcast+0x72) [0x2b6b26e839a2]
[r110-n64:11147] [ 9] /home/usr/bin/raxmlHPC-MPI-SSE3(doInference+0x652) [0x43e302]
[r110-n64:11147] [10] /home/usr/bin/raxmlHPC-MPI-SSE3(main+0x15c6) [0x407276]
[r110-n64:11147] [11] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2b6b28dd5d5d]
[r110-n64:11147] [12] /home/usr/bin/raxmlHPC-MPI-SSE3() [0x405bc9]
[r110-n64:11147] *** End of error message ***
[r110-n64:11148] [ 1] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x760) [0x2ad21f885740]
[r110-n64:11148] [ 2] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x69) [0x2ad21f8889c9]
[r110-n64:11148] [ 3] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x6f) [0x2ad21f88859f]
[r110-n64:11148] [ 4] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_generic+0x93) [0x2ad21f700cd3]
[r110-n64:11148] [ 5] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_binomial+0xae) [0x2ad21f700c1e]
[r110-n64:11148] [ 6] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(ompi_coll_tuned_bcast_intra_dec_fixed+0xfd) [0x2ad21f6f7ead]
[r110-n64:11148] [ 7] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(mca_coll_sync_bcast+0x6f) [0x2ad21f708bbf]
[r110-n64:11148] [ 8] /software6/mpi/openmpi/1.6.5_intel/lib/libmpi.so.1(MPI_Bcast+0x72) [0x2ad21f6a19a2]
[r110-n64:11148] [ 9] /home/usr/bin/raxmlHPC-MPI-SSE3(doInference+0x652) [0x43e302]
[r110-n64:11148] [10] /home/usr/bin/raxmlHPC-MPI-SSE3(main+0x15c6) [0x407276]
[r110-n64:11148] [11] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2ad2215f3d5d]
[r110-n64:11148] [12] /home/usr/bin/raxmlHPC-MPI-SSE3() [0x405bc9]
[r110-n64:11148] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 116 with PID 11148 on node r110-n64 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------


William Gearty

unread,
Jul 24, 2015, 12:02:47 PM7/24/15
to ra...@googlegroups.com
You should try updating to the most recent version of RAxML, first (8.2.0):
https://github.com/stamatak/standard-RAxML/releases
Then let the list know if you are still having problems.
-Will

--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Alexandros Stamatakis

unread,
Jul 25, 2015, 9:42:21 AM7/25/15
to ra...@googlegroups.com
yes, please update to the latest RAxML version 8.2.0, then if the
problem persist please get back to us,

alexis

On 24.07.2015 18:02, William Gearty wrote:
> You should try updating to the most recent version of RAxML, first (8.2.0):
> https://github.com/stamatak/standard-RAxML/releases
> Then let the list know if you are still having problems.
> -Will
>
> On Fri, Jul 24, 2015 at 9:00 AM, Pamela <pamel...@gmail.com
> <mailto:pamel...@gmail.com>> wrote:
>
> Hi,
> I received a segmentation fault error when running raxmlHPC-MPI-SSE3
> v8.1.3 for two of my alignments. The first one had 1407 taxa with
> 1590 patterns, and the second one had 1044 taxa with 1577 distinct
> pattern. I contacted the IT desk for the supercomputer and they said
> they "could see nothing in the hardware logs that would indicate
> that this was caused by a hardware problem".
>
> Both runs were able to generate the number of trees as requested,
> but got terminated before the best tree could be produced. I then
> had to manually generate the best tree afterward.
>
> The command for the run is as follow:
> $mpirun -n 128 /home/usr/bin/raxmlHPC-MPI-SSE3 -m GTRCATI -c
> 25 -e 0.1 -p 31415 -d -f d -N 200 -n outfile -s alignment.fas
>
> *The error for the 1st run: *
> *The error for the 2nd run:*
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> William Gearty
> people.stanford.edu/wgearty <http://people.stanford.edu/wgearty>
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Pamela

unread,
Jul 27, 2015, 8:25:34 PM7/27/15
to raxml, alexandros...@gmail.com
Hi,
Here is the error message for v8.2.0. I attached the info file for your reference.

Command
module load compilers/gcc/4.8
module load apps/blcr/0.8.4
module load mpi/openmpi/1.6.5

# Commande à exécuter
mpirun -n 128 /home/usr/bin/raxmlHPC-MPI-SSE3 -m GTRCATI -c 25 -e 0.1 -p 31415 -d -f d -N 200 -n outfile -s alingment.phy


Error
[r105-n19:00579] *** Process received signal ***
[r105-n19:00579] Signal: Segmentation fault (11)
[r105-n19:00579] Signal code:  (128)
[r105-n19:00579] Failing at address: (nil)
[r105-n19:00579] [ 0] /lib64/libpthread.so.0(+0xf710) [0x2b64bbadc710]
[r105-n19:00579] [ 1] /software6/mpi/openmpi/1.6.5_gcc/lib/libmpi.so.1(+0x244efc) [0x2b64ba008efc]
[r105-n19:00579] [ 2] /software6/mpi/openmpi/1.6.5_gcc/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x103) [0x2b64ba00b3d3]
[r105-n19:00579] [ 3] /software6/mpi/openmpi/1.6.5_gcc/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x55) [0x2b64ba00dd95]
[r105-n19:00579] [ 4] /software6/mpi/openmpi/1.6.5_gcc/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0xbc) [0x2b64ba00e1dc]
[r105-n19:00579] [ 5] /lib64/libc.so.6(+0x66ecb) [0x2b64bbd50ecb]
[r105-n19:00579] [ 6] /home/usr/bin/raxmlHPC-MPI-SSE3(printResult+0x1eb) [0x41b0bb]
[r105-n19:00579] [ 7] /home/usr/bin/raxmlHPC-MPI-SSE3(doInference+0x7c2) [0x4442b2]
[r105-n19:00579] [ 8] /home/usr/bin/raxmlHPC-MPI-SSE3(main+0x16ac) [0x4075bc]
[r105-n19:00579] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2b64bbd08d5d]
[r105-n19:00579] [10] /home/usr/bin/raxmlHPC-MPI-SSE3() [0x405e29]
[r105-n19:00579] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 119 with PID 579 on node r105-n19 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------


Thanks.

Pamela
RAxML_info.trail_raxmlv8_raxml_usr
Message has been deleted

Alexandros Stamatakis

unread,
Jul 28, 2015, 5:10:43 AM7/28/15
to raxml
great, even less work for me :-)

alexis

On 28.07.2015 03:32, Pamela wrote:
> Hi,
> I have solved the problem. I reduce the modules loaded to only the
> openmpi one, and that solved the issue. I don't really know why, but it
> worked.
>
> Thank you and the sorry again for the troubles :P.
>
> Pamela
>
> On Saturday, July 25, 2015 at 6:42:21 AM UTC-7, Alexis wrote:
>
> yes, please update to the latest RAxML version 8.2.0, then if the
> problem persist please get back to us,
>
> alexis
>
> On 24.07.2015 18:02, William Gearty wrote:
> > You should try updating to the most recent version of RAxML,
> first (8.2.0):
> > https://github.com/stamatak/standard-RAxML/releases
> > Then let the list know if you are still having problems.
> > -Will
> >
> > On Fri, Jul 24, 2015 at 9:00 AM, Pamela <pamel...@gmail.com
> <javascript:>
> > send an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
> >
> >
> > --
> > William Gearty
> > people.stanford.edu/wgearty <http://people.stanford.edu/wgearty>
> <http://people.stanford.edu/wgearty>
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
> Adjunct Professor, Dept. of Ecology and Evolutionary Biology,
> University
> of Arizona at Tucson
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>
Reply all
Reply to author
Forward
0 new messages