I've had problems with e2refine2d.py crashing, for seemingly more than one distinct reason. I've experienced the problem with linux64 source and linux64 bin distributions, as well as OSX bin. I wondered if it's my data files, but I just restarted with repicked particles, and I get errors still. I've been screening parameters and subsets of the data in parallel, and it's possible this may again be a concurrency issue (related thread: "Concurrency problem with e2refine2d.py").
Below I list the errors for OSX, then for linux64 below that. I'm sticking to the bin distributions here for simplicity, but the source distribution does the same thing for me.
==============
OSX
==============
In OSX, I get the following errors printed to the terminal. The program continues to run after "lost sys.stderr", but dies when traceback is printed. No core files created. I get complaints about the NUMPY interface, but some jobs complete anyway, and I'll be investigating that further:
COMMAND:
$ e2version.py
EMAN 2.12 (GITHUB: Mon Dec 5 12:19:33 2016)
Your EMAN2 is running on: Mac OS 10.11.6 x86_64
Your Python version is: 2.7.10
$ sh job_eman2-TEST-c2d-Array-locus-gen-j1.sh
# script does equivalent of FOR loop running e2refine2d.py - 4 independent runs, no threading. This is mostly to recreate what I do on the cluster.
# command="e2refine2d.py --input=sets/set01.lst --ncls=64 --normproj --fastseed --iter=5 --nbasisfp=64 --naliref=64 --center=xform.center --simalign=rotate_translate_iterative:maxiter=5:maxshift=10 --simaligncmp=ccc --simralign=refine --simraligncmp=ccc --simcmp=ccc --classkeep=0.85 --classiter=5 --classalign=rotate_translate_iterative:maxiter=5:maxshift=10 --classaligncmp=ccc --classralign=refine --classraligncmp=ccc --classaverager=ctf.auto --classcmp=ccc --classrefsf -v 2 "
TERMINAL OUTPUT:
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
Traceback (most recent call last):
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 328, in <module>
main()
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 96, in main
a=EMData.read_images(args[0])
File "/Applications/EMAN2-2016-12-05/lib/EMAN2db.py", line 467, in db_read_images
return EMData.read_images_c(fsp,*parms)
RuntimeError: FileAccessException at /build/co/eman2.daily/libEM/hdfio2.cpp:510: error with 'r2d_05/allrefs_01.hdf': 'cannot access file 'r2d_05/allrefs_01.hdf'' caught
Traceback (most recent call last):
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 328, in <module>
main()
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 96, in main
a=EMData.read_images(args[0])
File "/Applications/EMAN2-2016-12-05/lib/EMAN2db.py", line 467, in db_read_images
return EMData.read_images_c(fsp,*parms)
RuntimeError: FileAccessException at /build/co/eman2.daily/libEM/hdfio2.cpp:510: error with 'r2d_06/allrefs_01.hdf': 'cannot access file 'r2d_06/allrefs_01.hdf'' caught
Traceback (most recent call last):
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 328, in <module>
main()
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 96, in main
a=EMData.read_images(args[0])
File "/Applications/EMAN2-2016-12-05/lib/EMAN2db.py", line 467, in db_read_images
return EMData.read_images_c(fsp,*parms)
RuntimeError: FileAccessException at /build/co/eman2.daily/libEM/hdfio2.cpp:510: error with 'r2d_07/allrefs_01.hdf': 'cannot access file 'r2d_07/allrefs_01.hdf'' caught
Traceback (most recent call last):
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 328, in <module>
main()
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 96, in main
a=EMData.read_images(args[0])
File "/Applications/EMAN2-2016-12-05/lib/EMAN2db.py", line 467, in db_read_images
return EMData.read_images_c(fsp,*parms)
RuntimeError: FileAccessException at /build/co/eman2.daily/libEM/hdfio2.cpp:510: error with 'r2d_08/allrefs_01.hdf': 'cannot access file 'r2d_08/allrefs_01.hdf'' caught
IN LOG FILE:
*************** e2classaverage.py --input=r2d_05/input_subset.hdf --classmx=r2d_05/classmx_00.hdf --output=r2d_05/classes_init.hdf --iter=8 --force --bootstrap --center=xform.center --align=rotate_translate_iterative:maxiter=5:maxshift=10:maxshift=117 --averager=ctf.auto --keep=0.850000 --cmp=ccc --aligncmp=ccc --normproc=normalize.edgemean --ralign=refine --raligncmp=ccc
Class averaging beginning
Class averaging complete
Using references from r2d_05/classes_init.hdf
*************** e2proc2d.py r2d_05/classes_init.hdf r2d_05/allrefs_01.hdf --inplace --calccont --process=filter.highpass.gauss:cutoff_pixels=5 --process=normalize.circlemean:radius=-5
Input file 'r2d_05/classes_init.hdf' does not exist.
*************** e2stacksort.py r2d_05/allrefs_01.hdf r2d_05/allrefs_01.hdf --simcmp=sqeuclidean:normto=1 --simalign=rotate_translate_tree:maxres=10 --useali --iterative
Beginning image sort/alignment
Error running:
e2stacksort.py r2d_05/allrefs_01.hdf r2d_05/allrefs_01.hdf --simcmp=sqeuclidean:normto=1 --simalign=rotate_translate_tree:maxres=10 --useali --iterative
===========
linux64
===========
Here the jobs are run on a cluster, on distinct nodes. I run several independent jobs in parallel. Many of them die, so I rerun them until they don't die. This takes time. Below is an example of the history of re-runs, completing a set of 18 runs, where ACTIVE_FILES gets set to the runs that need to be run.
#ACTIVE_FILES=$( seq 1 1 18 )
#ACTIVE_FILES=( 2 3 5 7 8 9 13 15 18 )
#ACTIVE_FILES=( 2 3 5 7 9 15 18 )
#ACTIVE_FILES=( 7 )
#ACTIVE_FILES=( 2 )
COMMAND:
$ e2version.py
EMAN 2.12 (GITHUB: Sun Dec 4 12:37:39 2016)
Your EMAN2 is running on: Linux-3.10.0-327.36.1.el7.x86_64-x86_64-with-redhat-7.2-Maipo 3.10.0-327.36.1.el7.x86_64
Your Python version is: 2.7.3
# Note the e2refine2d.py command is different from OSX run.
$ e2refine2d.py --input=sets/${SET_FILES[$SGE_TASK_ID]} --ncls=64 --normproj --fastseed --iter=5 --nbasisfp=64 --naliref=64 --parallel=thread:8 --center=xform.center --simalign=rotate_translate_iterative:maxiter=5:maxshift=10 --simaligncmp=ccc --simralign=refine --simraligncmp=ccc --simcmp=ccc --classkeep=0.85 --classiter=5 --classalign=rotate_translate_iterative:maxiter=5:maxshift=10 --classaligncmp=ccc --classralign=refine --classraligncmp=ccc --classaverager=ctf.auto --classcmp=ccc -v 2
ERRORS NAMED IN LOG FILE:
85 simmx tasks left in main loop ^M85/96
84 simmx tasks left in main loop ^M84 simmx tasks left in main loop ^MError running:
e2simmx.py
r2d_11/aliref_04.hdf
sets/set07-split-011__ctf_flip_shrink2_radial46.lst r2d_11/simmx_04.hdf
-f --saveali --cmp=ccc
--align=rotate_translate_iterative:maxiter=5:maxshift=10 --aligncmp=ccc
--verbose=1 --ralign=refine --raligncmp=ccc --parallel=thread:8
***************
e2classaverage.py --input=r2d_12/input_subset.hdf
--classmx=r2d_12/classmx_00.hdf --output=r2d_12/classes_init.hdf
--iter=8 --force --bootstrap --center=xform.center
--align=rotate_translate_iterative:maxiter=5:maxshift=10:maxshift=37
--averager=ctf.auto --keep=0.850000 --cmp=ccc --aligncmp=ccc
--normproc=normalize.edgemean --ralign=refine --raligncmp=ccc
--parallel=thread:8
Class averaging beginning
Error running task : 0
Error running:
e2classaverage.py
--input=r2d_12/input_subset.hdf --classmx=r2d_12/classmx_00.hdf
--output=r2d_12/classes_init.hdf --iter=8 --force --bootstrap
--center=xform.center
--align=rotate_translate_iterative:maxiter=5:maxshift=10:maxshift=37
--averager=ctf.auto --keep=0.850000 --cmp=ccc --aligncmp=ccc
--normproc=normalize.edgemean --ralign=refine --raligncmp=ccc
--parallel=thread:8
OUTPUT FROM GDB LOOKING INTO CORE FILE(S) ("where" output):
$ gdb /hpcdata/lid3/eman2/EMAN2-2016-12-05/extlib/bin/python -c core.48092
#0 0x00002aaaabbcc2c0 in _int_free () from /lib64/libc.so.6
#1 0x00002aaaabbb9ff5 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2 0x000000000042b9ae in close_the_file (f=0x2aaac050fa50) at Objects/fileobject.c:456
#3 0x000000000042ba36 in file_close (f=0x2aaaabf0a760 <main_arena>) at Objects/fileobject.c:663
#4 0x000000000049d75a in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4005
#5 PyEval_EvalFrameEx (f=0x1cf9c70, throwflag=<optimized out>) at Python/ceval.c:2666
#6 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#7 PyEval_EvalFrameEx (f=0x88c2d0, throwflag=<optimized out>) at Python/ceval.c:2666
#8 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#9 PyEval_EvalFrameEx (f=0x886aa0, throwflag=<optimized out>) at Python/ceval.c:2666
#10 0x000000000049edab in PyEval_EvalCodeEx (co=0x2aaaaab92b30, globals=<optimized out>, locals=<optimized out>, args=0x0, argcount=0,
kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#11 0x000000000049ee22 in PyEval_EvalCode (co=0x2aaaabf0a760 <main_arena>, globals=0x80, locals=0xfffffffffffffdc0) at Python/ceval.c:667
#12 0x00000000004c0f81 in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>,
filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1353
#13 PyRun_FileExFlags (fp=0x8819f0, filename=0x7fffffff8698 "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2classaverage.py",
start=<optimized out>, globals=0x7e8e90, locals=0x7e8e90, closeit=1, flags=0x7fffffff7d30) at Python/pythonrun.c:1339
#14 0x00000000004c1238 in PyRun_SimpleFileExFlags (fp=<optimized out>,
filename=0x7fffffff8698 "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2classaverage.py", closeit=1, flags=0x7fffffff7d30)
at Python/pythonrun.c:943
#15 0x0000000000414bbd in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:639
#16 0x00002aaaabb70b15 in __libc_start_main () from /lib64/libc.so.6
#17 0x0000000000413d69 in _start ()
$ gdb /hpcdata/lid3/eman2/EMAN2-2016-12-05/extlib/bin/python -c core.73931
#0 0x00002aaaabb845f7 in raise () from /lib64/libc.so.6
#1 0x00002aaaabb85ce8 in abort () from /lib64/libc.so.6
#2 0x00002aaab873bc72 in gsl_error () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libgsl.so.0
#3 0x00002aaab87b31d1 in nmsimplex_set () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libgsl.so.0
#4 0x00002aaab78d9591 in EMAN::RefineAligner::align(EMAN::EMData*, EMAN::EMData*, std::string const&, EMAN::Dict const&) const ()
from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libEM2.so
#5 0x00002aaab770fd80 in EMAN::EMData::align(std::string const&, EMAN::EMData*, EMAN::Dict const&, std::string const&, EMAN::Dict const&) ()
from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libEM2.so
#6 0x00002aaab71b0f84 in EMData_align_wrapper5(EMAN::EMData&, std::string const&, EMAN::EMData*, EMAN::Dict const&, std::string const&, EMAN::Dict const&) () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libpyEMData2.so
#7 0x00002aaab72399b1 in boost::python::detail::caller_arity<6u>::impl<EMAN::EMData* (*)(EMAN::EMData&, std::string const&, EMAN::EMData*, EMAN::Dict const&, std::string const&, EMAN::Dict const&), boost::python::return_value_policy<boost::python::manage_new_object, boost::python::default_call_policies>, boost::mpl::vector7<EMAN::EMData*, EMAN::EMData&, std::string const&, EMAN::EMData*, EMAN::Dict const&, std::string const&, EMAN::Dict const&> >::operator()(_object*, _object*) () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libpyEMData2.so
#8 0x00002aaab7e0ee47 in boost::python::objects::function::call(_object*, _object*) const ()
from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libboost_python.so.1.52.0
#9 0x00002aaab7e0f1f8 in boost::detail::function::void_function_ref_invoker0<boost::python::objects::(anonymous namespace)::bind_return, void>::invoke(boost::detail::function::function_buffer&) () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libboost_python.so.1.52.0
#10 0x00002aaab7e16ba0 in boost::python::handle_exception_impl(boost::function0<void>) ()
from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libboost_python.so.1.52.0
#11 0x00002aaab7e0bdcf in function_call () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libboost_python.so.1.52.0
#12 0x00000000004189cd in PyObject_Call (func=0x939a30, arg=0x2aaabe35a808, kw=0x0) at Objects/abstract.c:2529
#13 0x000000000049818d in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4239
#14 PyEval_EvalFrameEx (f=0x1c88b80, throwflag=<optimized out>) at Python/ceval.c:2666
#15 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#16 PyEval_EvalFrameEx (f=0x1c17f00, throwflag=<optimized out>) at Python/ceval.c:2666
#17 0x000000000049edab in PyEval_EvalCodeEx (co=0x2aaac0ea1c30, globals=<optimized out>, locals=<optimized out>, args=0x12, argcount=17,
kws=0x1c19850, kwcount=0, defs=0x2aaac0eac068, defcount=17, closure=0x0) at Python/ceval.c:3253
#18 0x000000000049c732 in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4117
#19 PyEval_EvalFrameEx (f=0x1c195f0, throwflag=<optimized out>) at Python/ceval.c:2666
#20 0x000000000049edab in PyEval_EvalCodeEx (co=0x2aaac0ea19b0, globals=<optimized out>, locals=<optimized out>, args=0x1c177a8, argcount=2,
kws=0x1c177b8, kwcount=0, defs=0x2aaac0ea2928, defcount=1, closure=0x0) at Python/ceval.c:3253
#21 0x000000000049c732 in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4117
#22 PyEval_EvalFrameEx (f=0x1c175e0, throwflag=<optimized out>) at Python/ceval.c:2666
#23 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#24 PyEval_EvalFrameEx (f=0x1b22000, throwflag=<optimized out>) at Python/ceval.c:2666
#25 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#26 PyEval_EvalFrameEx (f=0x899520, throwflag=<optimized out>) at Python/ceval.c:2666
#27 0x000000000049edab in PyEval_EvalCodeEx (co=0x2aaaaaba2330, globals=<optimized out>, locals=<optimized out>, args=0x0, argcount=0,
kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#28 0x000000000049ee22 in PyEval_EvalCode (co=0x11820, globals=0x11820, locals=0x6) at Python/ceval.c:667
#29 0x00000000004c0f81 in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>,
filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1353
#30 PyRun_FileExFlags (fp=0x8819f0, filename=0x7fffffff87ae "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2parallel.py", start=<optimized out>,
globals=0x7e8e90, locals=0x7e8e90, closeit=1, flags=0x7fffffff7eb0) at Python/pythonrun.c:1339
#31 0x00000000004c1238 in PyRun_SimpleFileExFlags (fp=<optimized out>,
filename=0x7fffffff87ae "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2parallel.py", closeit=1, flags=0x7fffffff7eb0)
at Python/pythonrun.c:943
#32 0x0000000000414bbd in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:639
#33 0x00002aaaabb70b15 in __libc_start_main () from /lib64/libc.so.6
#34 0x0000000000413d69 in _start ()
Thanks,
John