I am trying to run exabayes on a Virtual Machine using cloud computing. The VM runs Ubuntu using 16 CPUs and 64 Gb of RAM. My dataset consists of 157 tips (individuals) and the alignment is 180540 bp long with 26545 distinct alignment patterns. Parameter estimation in conducted for 2006 partitions (concatenated 90 bp Illumina reads).
When I execute a dry run everything seems to work fine. Below is the command used and the log produced during the dry run:
mpirun -np 16 ./exabayes -f Concat_60_1.phy -q DNAPartition_2006.txt -n test -s $RANDOM -c config.nex -R 2 -C 4 -S -d
.
.
.
Will execute 2 runs in parallel.
Will execute 4 chains in parallel.
initialized diagnostics file
initialized file ExaBayes_topologies.test.0
initialized file ExaBayes_parameters.test.0
initialized file ExaBayes_topologies.test.1
initialized file ExaBayes_parameters.test.1
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
initial state:
================================================================
[run=0,heat=0,gen=0] Lnl: -4357936.99 LnPr: 2974.66 RNG(key={515655797,3572140441},ctr={0,0})
[run=0,heat=1,gen=0] Lnl: -4364657.69 LnPr: 2974.66 RNG(key={1548868209,620774698},ctr={0,0})
[run=0,heat=2,gen=0] Lnl: -4365141.33 LnPr: 2974.66 RNG(key={965252863,3735499560},ctr={0,0})
[run=0,heat=3,gen=0] Lnl: -4354019.28 LnPr: 2974.66 RNG(key={2090256741,1388714654},ctr={0,0})
================================================================
[run=1,heat=0,gen=0] Lnl: -4362631.50 LnPr: 2974.66 RNG(key={1591443081,2980482111},ctr={0,0})
[run=1,heat=1,gen=0] Lnl: -4376482.80 LnPr: 2974.66 RNG(key={1022808228,3130032066},ctr={0,0})
[run=1,heat=2,gen=0]
Lnl: -4363703.96
LnPr: 2974.66
RNG(key={1155358002,
4165128489},ctr={0,0})
[run=1,heat=3,gen=0]
Lnl: -4367416.40
LnPr: 2974.66
RNG(key={
2544924441,1578712497},ctr={0,0})
================================================================
load distribution (rank,coords,#numParts,#numPatterns,chainsPerRun):
[ 0 ] [0,0,0] 1004 13284 (0,0)
[ 1 ] [0,0,1] 1003 13283 (0,0)
[ 2 ] [0,1,0] 1004 13284 (0,1)
[ 3 ] [0,1,1] 1003 13283 (0,1)
[ 4 ] [0,2,0] 1004 13284 (0,2)
[ 5 ] [0,2,1] 1003 13283 (0,2)
[ 6 ] [0,3,0] 1004 13284 (0,3)
[ 7 ] [0,3,1] 1003 13283 (0,3)
[ 8 ] [1,0,0] 1004 13284 (1,0)
[ 9 ] [1,0,1] 1003 13283 (1,0)
[ 10 ] [1,1,0] 1004 13284 (1,1)
[ 11 ] [1,1,1] 1003 13283 (1,1)
[ 12 ] [1,2,0] 1004 13284 (1,2)
[ 13 ] [1,2,1] 1003 13283 (1,2)
[ 14 ] [1,3,0] 1004 13284 (1,3)
[ 15 ] [1,3,1] 1003 13283 (1,3)
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
Command line, input data and config file is okay. Exiting gracefully.
However, when I try to execute the actual run (without the -d option), after the initialisation of the files I get the following error:
initialized diagnostics file ExaBayes_diagnostics.test
initialized file ExaBayes_topologies.test.0
initialized file ExaBayes_parameters.test.0
initialized file ExaBayes_topologies.test.1
initialized file ExaBayes_parameters.test.1
[spy-raxml:29589] *** Process received signal ***
[spy-raxml:29589] Signal: Segmentation fault (11)
[spy-raxml:29589] Signal code: Address not mapped (1)
[spy-raxml:29589] Failing at address: (nil)
[spy-raxml:29589] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f2a4a926340]
[spy-raxml:29589] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x92ad1) [0x7f2a4a5e2ad1]
[spy-raxml:29589] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x8d355) [0x7f2a4a5dd355]
[spy-raxml:29589] [ 3] ./exabayes() [0x56c8ff]
[spy-raxml:29589] [ 4] ./exabayes() [0x532704]
[spy-raxml:29589] [ 5] ./exabayes() [0x436ea8]
[spy-raxml:29589] [ 6] ./exabayes() [0x43fc7a]
[spy-raxml:29589] [ 7] ./exabayes() [0x441267]
[spy-raxml:29589] [ 8] ./exabayes() [0x44f5a2]
[spy-raxml:29589] [ 9] ./exabayes() [0x4114ba]
[spy-raxml:29589] [10] ./exabayes() [0x411708]
[spy-raxml:29589] [11] ./exabayes() [0x40f467]
[spy-raxml:29589] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f2a4a571ec5]
[spy-raxml:29589] [13] ./exabayes() [0x4107df]
[spy-raxml:29589] *** End of error message ***
[spy-raxml:29590] *** Process received signal ***
[spy-raxml:29590] Signal: Segmentation fault (11)
[spy-raxml:29590] Signal code: Address not mapped (1)
[spy-raxml:29590] Failing at address: (nil)
[spy-raxml:29590] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7fbfcb865340]
[spy-raxml:29590] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x92ad1) [0x7fbfcb521ad1]
[spy-raxml:29590] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x8d355) [0x7fbfcb51c355]
[spy-raxml:29590] [ 3] ./exabayes() [0x56c8ff]
[spy-raxml:29590] [ 4] ./exabayes() [0x532704]
[spy-raxml:29590] [ 5] ./exabayes() [0x436ea8]
[spy-raxml:29590] [ 6] ./exabayes() [0x43fc7a]
[spy-raxml:29590] [ 7] ./exabayes() [0x441267]
[spy-raxml:29590] [ 8] ./exabayes() [0x44f5a2]
[spy-raxml:29590] [ 9] ./exabayes() [0x4114ba]
[spy-raxml:29590] [10] ./exabayes() [0x411708]
[spy-raxml:29590] [11] ./exabayes() [0x40f467]
[spy-raxml:29590] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fbfcb4b0ec5]
[spy-raxml:29590] [13] ./exabayes() [0x4107df]
[spy-raxml:29590] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 14 with PID 29589 on node spy-raxml exited on signal 11 (Segmentation fault).
The same error occurs regardless of the parallelisation combinations (-R and -C).
mpirun -np 16 ./exabayes -f Concat_60_1.phy -q DNAPartition_2006.txt -n test -s $RANDOM -c config.nex -R 2 -C 2 -S