intel/2013.1.117(49):ERROR:105: Unable to locate a modulefile for 'mkl' [gcn-7-65.sdsc.edu:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 8. MPI process died? [gcn-7-65.sdsc.edu:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died? [gcn-7-65.sdsc.edu:mpispawn_0][child_handler] MPI process (rank: 4, pid: 51514) terminated with signal 9 -> abort job [gcn-7-65.sdsc.edu:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node gcn-7-65 aborted: MPI process error (1)
begin mrbayes; set precision = 15; set usebeagle=no; set scientific=Yes; set autoclose=yes; set nowarnings=yes; execute infile.nex; lset coding = all; lset Nst= 6; lset Nucmodel= 4by4; lset Rates= gamma; lset Nbetacat= 5; prset pinvarpr = uniform(0.0,1.0); prset ratepr = fixed; prset statefreqpr = dirichlet(1.0); prset revmatpr = dirichlet(1.0,1.0,1.0,1.0,1.0,1.0); prset ratecorrpr = uniform(-1.0,1.0); prset covswitchpr = uniform(0.0,100.0); prset Tratiopr = beta(1.0, 1.0); prset brlenspr = unconstrained:exponential(10.0); report Siterates=No; report revmat=dirichlet; mcmc append=yes ngen=12000000 nruns=2 nchains=4 temp=0.200 swapfreq=1 nswaps=1 samplefreq=1000 mcmcdiagn=Yes minpartfreq=0.1 allchains=Yes relburnin=Yes burnin=0 burninfrac=0.25 stoprule=Yes starttree=random stopval=0.01 Savebrlens=Yes nperts=0 Ordertaxa=Yes; sump burnin=10 relburnin=Yes burninfrac=0.25 nruns=2 outputname=sumpoutput.out; sumt burnin=10 relburnin=Yes burninfrac=0.25 nruns=2 ntrees=1 minpartfreq=0.05 contype=Halfcompat conformat=Figtree; quit end;
Task\ label=Parrot Bayesian Run 4.3 (nobeag)_2 Task\ ID=1115576 Tool=MRBAYES_321RESTARTBETA created\ on=2017-02-26 15:30:26.0 JobHandle=NGBW-JOB-MRBAYES_321RESTARTBETA-174EF6FA5B1048B481E9357F87917FEF resource=gordon User\ ID=94385 User\ Name=kprovost email=kpro...@amnh.org Output=(ALL_FILES,*,UNKNOWN,UNKNOWN,UNKNOWN) ChargeFactor=1.000000 cores=8 JOBID=2876537.gordon-fe2.local
This is just a guess, but it is dying suddenly here, without error message:
Setting default partition (does not divide up characters)
Setting model default
Seed (for generating default start values) = 539544633
Setting output file names to "infile.nex.run<i>.<p|t>"
Exiting data block
Reading mrbayes block
Setting Precision to 15
Setting usebeagle to no
Here is your command block (is this the same command block you used to create the first run?).
begin mrbayes;
set precision = 15;
set usebeagle=no;
set scientific=Yes;
I would try changing the order, so set scientific=Yes; comes before precision =15;
Or just strike set scientific=Yes; from the file, because “Yes” is the default.
If that doesn't help, please let me know.
Mark
[gcn-18-46.sdsc.edu:mpirun_rsh][signal_processor] Caught signal 15, killing job [gcn-18-46.sdsc.edu:mpispawn_0][report_error] connect() failed: Connection refused (111)
Task\ label=Parrot Bayesian Run 4.3 (nb TEST) Task\ ID=1116477 Tool=MRBAYES_321RESTARTBETA created\ on=2017-02-27 17:25:32.0 JobHandle=NGBW-JOB-MRBAYES_321RESTARTBETA-D99D9732D58F4B3698DE4FAF4168C099 resource=gordon User\ ID=94385 User\ Name=kprovost email=kpro...@amnh.org Output=(ALL_FILES,*,UNKNOWN,UNKNOWN,UNKNOWN) ChargeFactor=1.000000 cores=8 JOBID=2877395.gordon-fe2.local
=>> PBS: job killed: walltime 1837 exceeded limit 1800
kill -8786: No such process
So just clone that job, set a longer max run time, submit and it should work fine.
Let me know if you have further issues.
Best,
Mark
intel/2013.1.117(49):ERROR:105: Unable to locate a modulefile for 'mkl' [gcn-18-31.sdsc.edu:mpispawn_0][child_handler] MPI process (rank: 1, pid: 79672) terminated with signal 9 -> abort job [gcn-18-31.sdsc.edu:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 10. MPI process died? [gcn-18-31.sdsc.edu:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died? [gcn-18-31.sdsc.edu:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node gcn-18-31 aborted: Error while reading a PMI socket (4)
Task\ label=Parrot Bayesian Run 4.3 (nb TEST)_2 Task\ ID=1119083 Tool=MRBAYES_321RESTARTBETA created\ on=2017-03-02 15:26:23.0 JobHandle=NGBW-JOB-MRBAYES_321RESTARTBETA-36833136526942F88249647B9B33AABC resource=gordon User\ ID=94385 User\ Name=kprovost email=kpro...@amnh.org Output=(ALL_FILES,*,UNKNOWN,UNKNOWN,UNKNOWN) ChargeFactor=1.000000 cores=8 JOBID=2880232.gordon-fe2.local
execute infile.nex;
sumt burnin=10 relburnin=Yes burninfrac=0.25 nruns=2 ntrees=1 minpartfreq=0.05 contype=Halfcompat conformat=Figtree;
quit
end;
I don't think the quit statement belongs there.
Best,
mark