e2refine2d.py crashes

John Gallagher

unread,

Dec 8, 2016, 12:26:08 PM12/8/16

to EMAN2

I've had problems with e2refine2d.py crashing, for seemingly more than one distinct reason. I've experienced the problem with linux64 source and linux64 bin distributions, as well as OSX bin. I wondered if it's my data files, but I just restarted with repicked particles, and I get errors still. I've been screening parameters and subsets of the data in parallel, and it's possible this may again be a concurrency issue (related thread: "Concurrency problem with e2refine2d.py").

Below I list the errors for OSX, then for linux64 below that. I'm sticking to the bin distributions here for simplicity, but the source distribution does the same thing for me.

==============
OSX
==============

In OSX, I get the following errors printed to the terminal. The program continues to run after "lost sys.stderr", but dies when traceback is printed. No core files created. I get complaints about the NUMPY interface, but some jobs complete anyway, and I'll be investigating that further:

COMMAND:
$ e2version.py
EMAN 2.12 (GITHUB: Mon Dec 5 12:19:33 2016)
Your EMAN2 is running on: Mac OS 10.11.6 x86_64
Your Python version is: 2.7.10

$ sh job_eman2-TEST-c2d-Array-locus-gen-j1.sh
# script does equivalent of FOR loop running e2refine2d.py - 4 independent runs, no threading. This is mostly to recreate what I do on the cluster.
# command="e2refine2d.py --input=sets/set01.lst --ncls=64 --normproj --fastseed --iter=5 --nbasisfp=64 --naliref=64 --center=xform.center --simalign=rotate_translate_iterative:maxiter=5:maxshift=10 --simaligncmp=ccc --simralign=refine --simraligncmp=ccc --simcmp=ccc --classkeep=0.85 --classiter=5 --classalign=rotate_translate_iterative:maxiter=5:maxshift=10 --classaligncmp=ccc --classralign=refine --classraligncmp=ccc --classaverager=ctf.auto --classcmp=ccc --classrefsf -v 2 "

TERMINAL OUTPUT:
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
Traceback (most recent call last):
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 328, in <module>
    main()
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 96, in main
    a=EMData.read_images(args[0])
File "/Applications/EMAN2-2016-12-05/lib/EMAN2db.py", line 467, in db_read_images
    return EMData.read_images_c(fsp,*parms)
RuntimeError: FileAccessException at /build/co/eman2.daily/libEM/hdfio2.cpp:510: error with 'r2d_05/allrefs_01.hdf': 'cannot access file 'r2d_05/allrefs_01.hdf'' caught

Traceback (most recent call last):
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 328, in <module>
    main()
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 96, in main
    a=EMData.read_images(args[0])
File "/Applications/EMAN2-2016-12-05/lib/EMAN2db.py", line 467, in db_read_images
    return EMData.read_images_c(fsp,*parms)
RuntimeError: FileAccessException at /build/co/eman2.daily/libEM/hdfio2.cpp:510: error with 'r2d_06/allrefs_01.hdf': 'cannot access file 'r2d_06/allrefs_01.hdf'' caught

Traceback (most recent call last):
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 328, in <module>
    main()
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 96, in main
    a=EMData.read_images(args[0])
File "/Applications/EMAN2-2016-12-05/lib/EMAN2db.py", line 467, in db_read_images
    return EMData.read_images_c(fsp,*parms)
RuntimeError: FileAccessException at /build/co/eman2.daily/libEM/hdfio2.cpp:510: error with 'r2d_07/allrefs_01.hdf': 'cannot access file 'r2d_07/allrefs_01.hdf'' caught

Traceback (most recent call last):
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 328, in <module>
    main()
File "/Applications/EMAN2-2016-12-05//bin/e2stacksort.py", line 96, in main
    a=EMData.read_images(args[0])
File "/Applications/EMAN2-2016-12-05/lib/EMAN2db.py", line 467, in db_read_images
    return EMData.read_images_c(fsp,*parms)
RuntimeError: FileAccessException at /build/co/eman2.daily/libEM/hdfio2.cpp:510: error with 'r2d_08/allrefs_01.hdf': 'cannot access file 'r2d_08/allrefs_01.hdf'' caught

IN LOG FILE:
*************** e2classaverage.py --input=r2d_05/input_subset.hdf --classmx=r2d_05/classmx_00.hdf --output=r2d_05/classes_init.hdf --iter=8 --force --bootstrap --center=xform.center --align=rotate_translate_iterative:maxiter=5:maxshift=10:maxshift=117 --averager=ctf.auto --keep=0.850000 --cmp=ccc --aligncmp=ccc --normproc=normalize.edgemean --ralign=refine --raligncmp=ccc
Class averaging beginning
Class averaging complete
Using references from r2d_05/classes_init.hdf
*************** e2proc2d.py r2d_05/classes_init.hdf r2d_05/allrefs_01.hdf --inplace --calccont --process=filter.highpass.gauss:cutoff_pixels=5 --process=normalize.circlemean:radius=-5
Input file 'r2d_05/classes_init.hdf' does not exist.
*************** e2stacksort.py r2d_05/allrefs_01.hdf r2d_05/allrefs_01.hdf --simcmp=sqeuclidean:normto=1 --simalign=rotate_translate_tree:maxres=10 --useali --iterative
Beginning image sort/alignment
Error running:
e2stacksort.py r2d_05/allrefs_01.hdf r2d_05/allrefs_01.hdf --simcmp=sqeuclidean:normto=1 --simalign=rotate_translate_tree:maxres=10 --useali --iterative

===========
linux64
===========

Here the jobs are run on a cluster, on distinct nodes. I run several independent jobs in parallel. Many of them die, so I rerun them until they don't die. This takes time. Below is an example of the history of re-runs, completing a set of 18 runs, where ACTIVE_FILES gets set to the runs that need to be run.
#ACTIVE_FILES=$( seq 1 1 18 )
#ACTIVE_FILES=( 2 3 5 7 8 9 13 15 18 )
#ACTIVE_FILES=( 2 3 5 7 9 15 18 )
#ACTIVE_FILES=( 7 )
#ACTIVE_FILES=( 2 )

COMMAND:
$ e2version.py
EMAN 2.12 (GITHUB: Sun Dec 4 12:37:39 2016)
Your EMAN2 is running on: Linux-3.10.0-327.36.1.el7.x86_64-x86_64-with-redhat-7.2-Maipo 3.10.0-327.36.1.el7.x86_64
Your Python version is: 2.7.3

# Note the e2refine2d.py command is different from OSX run.
$ e2refine2d.py --input=sets/${SET_FILES[$SGE_TASK_ID]} --ncls=64 --normproj --fastseed --iter=5 --nbasisfp=64 --naliref=64 --parallel=thread:8 --center=xform.center --simalign=rotate_translate_iterative:maxiter=5:maxshift=10 --simaligncmp=ccc --simralign=refine --simraligncmp=ccc --simcmp=ccc --classkeep=0.85 --classiter=5 --classalign=rotate_translate_iterative:maxiter=5:maxshift=10 --classaligncmp=ccc --classralign=refine --classraligncmp=ccc --classaverager=ctf.auto --classcmp=ccc -v 2

ERRORS NAMED IN LOG FILE:
85 simmx tasks left in main loop   ^M85/96
84 simmx tasks left in main loop   ^M84 simmx tasks left in main loop   ^MError running:
e2simmx.py r2d_11/aliref_04.hdf sets/set07-split-011__ctf_flip_shrink2_radial46.lst r2d_11/simmx_04.hdf -f --saveali --cmp=ccc --align=rotate_translate_iterative:maxiter=5:maxshift=10 --aligncmp=ccc --verbose=1 --ralign=refine --raligncmp=ccc --parallel=thread:8

*************** e2classaverage.py --input=r2d_12/input_subset.hdf --classmx=r2d_12/classmx_00.hdf --output=r2d_12/classes_init.hdf --iter=8 --force --bootstrap --center=xform.center --align=rotate_translate_iterative:maxiter=5:maxshift=10:maxshift=37 --averager=ctf.auto --keep=0.850000 --cmp=ccc --aligncmp=ccc --normproc=normalize.edgemean --ralign=refine --raligncmp=ccc --parallel=thread:8
Class averaging beginning
Error running task : 0
Error running:
e2classaverage.py --input=r2d_12/input_subset.hdf --classmx=r2d_12/classmx_00.hdf --output=r2d_12/classes_init.hdf --iter=8 --force --bootstrap --center=xform.center --align=rotate_translate_iterative:maxiter=5:maxshift=10:maxshift=37 --averager=ctf.auto --keep=0.850000 --cmp=ccc --aligncmp=ccc --normproc=normalize.edgemean --ralign=refine --raligncmp=ccc --parallel=thread:8

OUTPUT FROM GDB LOOKING INTO CORE FILE(S) ("where" output):
$ gdb /hpcdata/lid3/eman2/EMAN2-2016-12-05/extlib/bin/python -c core.48092
#0 0x00002aaaabbcc2c0 in _int_free () from /lib64/libc.so.6
#1 0x00002aaaabbb9ff5 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2 0x000000000042b9ae in close_the_file (f=0x2aaac050fa50) at Objects/fileobject.c:456
#3 0x000000000042ba36 in file_close (f=0x2aaaabf0a760 <main_arena>) at Objects/fileobject.c:663
#4 0x000000000049d75a in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4005
#5 PyEval_EvalFrameEx (f=0x1cf9c70, throwflag=<optimized out>) at Python/ceval.c:2666
#6 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#7 PyEval_EvalFrameEx (f=0x88c2d0, throwflag=<optimized out>) at Python/ceval.c:2666
#8 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#9 PyEval_EvalFrameEx (f=0x886aa0, throwflag=<optimized out>) at Python/ceval.c:2666
#10 0x000000000049edab in PyEval_EvalCodeEx (co=0x2aaaaab92b30, globals=<optimized out>, locals=<optimized out>, args=0x0, argcount=0,
    kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#11 0x000000000049ee22 in PyEval_EvalCode (co=0x2aaaabf0a760 <main_arena>, globals=0x80, locals=0xfffffffffffffdc0) at Python/ceval.c:667
#12 0x00000000004c0f81 in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>,
    filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1353
#13 PyRun_FileExFlags (fp=0x8819f0, filename=0x7fffffff8698 "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2classaverage.py",
    start=<optimized out>, globals=0x7e8e90, locals=0x7e8e90, closeit=1, flags=0x7fffffff7d30) at Python/pythonrun.c:1339
#14 0x00000000004c1238 in PyRun_SimpleFileExFlags (fp=<optimized out>,
    filename=0x7fffffff8698 "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2classaverage.py", closeit=1, flags=0x7fffffff7d30)
    at Python/pythonrun.c:943
#15 0x0000000000414bbd in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:639
#16 0x00002aaaabb70b15 in __libc_start_main () from /lib64/libc.so.6
#17 0x0000000000413d69 in _start ()

$ gdb /hpcdata/lid3/eman2/EMAN2-2016-12-05/extlib/bin/python -c core.73931
#0 0x00002aaaabb845f7 in raise () from /lib64/libc.so.6
#1 0x00002aaaabb85ce8 in abort () from /lib64/libc.so.6
#2 0x00002aaab873bc72 in gsl_error () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libgsl.so.0
#3 0x00002aaab87b31d1 in nmsimplex_set () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libgsl.so.0
#4 0x00002aaab78d9591 in EMAN::RefineAligner::align(EMAN::EMData*, EMAN::EMData*, std::string const&, EMAN::Dict const&) const ()
   from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libEM2.so
#5 0x00002aaab770fd80 in EMAN::EMData::align(std::string const&, EMAN::EMData*, EMAN::Dict const&, std::string const&, EMAN::Dict const&) ()
   from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libEM2.so
#6 0x00002aaab71b0f84 in EMData_align_wrapper5(EMAN::EMData&, std::string const&, EMAN::EMData*, EMAN::Dict const&, std::string const&, EMAN::Dict const&) () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libpyEMData2.so
#7 0x00002aaab72399b1 in boost::python::detail::caller_arity<6u>::impl<EMAN::EMData* (*)(EMAN::EMData&, std::string const&, EMAN::EMData*, EMAN::Dict const&, std::string const&, EMAN::Dict const&), boost::python::return_value_policy<boost::python::manage_new_object, boost::python::default_call_policies>, boost::mpl::vector7<EMAN::EMData*, EMAN::EMData&, std::string const&, EMAN::EMData*, EMAN::Dict const&, std::string const&, EMAN::Dict const&> >::operator()(_object*, _object*) () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libpyEMData2.so
#8 0x00002aaab7e0ee47 in boost::python::objects::function::call(_object*, _object*) const ()
   from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libboost_python.so.1.52.0
#9 0x00002aaab7e0f1f8 in boost::detail::function::void_function_ref_invoker0<boost::python::objects::(anonymous namespace)::bind_return, void>::invoke(boost::detail::function::function_buffer&) () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libboost_python.so.1.52.0
#10 0x00002aaab7e16ba0 in boost::python::handle_exception_impl(boost::function0<void>) ()
   from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libboost_python.so.1.52.0
#11 0x00002aaab7e0bdcf in function_call () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libboost_python.so.1.52.0
#12 0x00000000004189cd in PyObject_Call (func=0x939a30, arg=0x2aaabe35a808, kw=0x0) at Objects/abstract.c:2529
#13 0x000000000049818d in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4239
#14 PyEval_EvalFrameEx (f=0x1c88b80, throwflag=<optimized out>) at Python/ceval.c:2666
#15 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#16 PyEval_EvalFrameEx (f=0x1c17f00, throwflag=<optimized out>) at Python/ceval.c:2666
#17 0x000000000049edab in PyEval_EvalCodeEx (co=0x2aaac0ea1c30, globals=<optimized out>, locals=<optimized out>, args=0x12, argcount=17,
    kws=0x1c19850, kwcount=0, defs=0x2aaac0eac068, defcount=17, closure=0x0) at Python/ceval.c:3253
#18 0x000000000049c732 in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4117
#19 PyEval_EvalFrameEx (f=0x1c195f0, throwflag=<optimized out>) at Python/ceval.c:2666
#20 0x000000000049edab in PyEval_EvalCodeEx (co=0x2aaac0ea19b0, globals=<optimized out>, locals=<optimized out>, args=0x1c177a8, argcount=2,
    kws=0x1c177b8, kwcount=0, defs=0x2aaac0ea2928, defcount=1, closure=0x0) at Python/ceval.c:3253
#21 0x000000000049c732 in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4117
#22 PyEval_EvalFrameEx (f=0x1c175e0, throwflag=<optimized out>) at Python/ceval.c:2666
#23 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#24 PyEval_EvalFrameEx (f=0x1b22000, throwflag=<optimized out>) at Python/ceval.c:2666
#25 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4107
#26 PyEval_EvalFrameEx (f=0x899520, throwflag=<optimized out>) at Python/ceval.c:2666
#27 0x000000000049edab in PyEval_EvalCodeEx (co=0x2aaaaaba2330, globals=<optimized out>, locals=<optimized out>, args=0x0, argcount=0,
    kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#28 0x000000000049ee22 in PyEval_EvalCode (co=0x11820, globals=0x11820, locals=0x6) at Python/ceval.c:667
#29 0x00000000004c0f81 in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>, globals=<optimized out>,
    filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1353
#30 PyRun_FileExFlags (fp=0x8819f0, filename=0x7fffffff87ae "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2parallel.py", start=<optimized out>,
    globals=0x7e8e90, locals=0x7e8e90, closeit=1, flags=0x7fffffff7eb0) at Python/pythonrun.c:1339
#31 0x00000000004c1238 in PyRun_SimpleFileExFlags (fp=<optimized out>,
    filename=0x7fffffff87ae "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2parallel.py", closeit=1, flags=0x7fffffff7eb0)
    at Python/pythonrun.c:943
#32 0x0000000000414bbd in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:639
#33 0x00002aaaabb70b15 in __libc_start_main () from /lib64/libc.so.6
#34 0x0000000000413d69 in _start ()

Thanks,
John

Steve Ludtke

unread,

Dec 11, 2016, 12:43:22 AM12/11/16

to em...@googlegroups.com

Hi John. Sorry for the slow reply, just got back in town.

On the mac, the error seems to be happening because it isn't detecting the failure in the previous step (class-averaging) where you are also seeing a failure on the cluster. Thanks for providing the detailed stack traces. It looks like the refine aligner is failing on some of your particles in a serious way. The options you are using for e2refine2d are pretty unusual, and very different from the recommended defaults. I'm a little curious what motivated them? :

- This is a very large nbasisfp value. Normally the number of PCA vectors is more like 5-10.

- naliref is also unusually large. While this probably isn't harmful except for making the refinement take much longer, it is a bit odd.

- the fact that nbasisfp is as large as ncls is very very unusual. In a sense it largely defeats the purpose of doing PCA

- simalign and classalign are normally rotate_translate_flip or (very recently) rotate_translate tree. rotate_translate_iterative is not nearly as well tested

- classaverager of ctf.auto is ok, though most people prefer the appearance of ctf.weight

- While you certainly can use simralign and classralign, in most cases this doesn't do anything very useful in this type of class-averaging. This is also the algorithm which is crashing on you, so if you turn it off the problem may go away

The fact that the refine aligner is crashing usually indicates a pretty significant issue with your data. One possibility is that you don't have your data set properly inverted, ie you have dark particles on a lighter background instead of the other way around. The other possibility is that you just have a lot of bad particles... If you send me a screenshot (you don't have to post it here) of a few of your particles, I may be able to offer a more detailed suggestion

----------------------------------------------------------------------------
Steven Ludtke, Ph.D.
Professor, Dept. of Biochemistry and Mol. Biol. Those who do

Co-Director National Center For Macromolecular Imaging ARE

Baylor College of Medicine The converse
slu...@bcm.edu -or- ste...@alumni.caltech.edu also applies
http://ncmi.bcm.edu/~stevel

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Gallagher

unread,

Dec 11, 2016, 12:17:30 PM12/11/16

to EMAN2

My motivation for the parameters I've used is to try and separate particles from non-particles as best as possible, with less concern for the quality of individual class averages, or the runtime. But these parameters are just where I left off of screening parameters when I decided to switch to figuring out how get the program to run to completion every time.

Thank you for the comments on the various program parameters - my understanding of the tradeoffs when choosing parameters was limited to "e2refine2d.py --help" and the GUI when I click on the "?" - the latter of which is where I just learned I could supply parameters to the aligner.

Speaking of parameters to the aligner, I noted that after I supplied at the command line: "--simalign=rotate_translate_iterative:maxiter=5:maxshift=10", this gets translated into the following internal command, with maxshift suspiciously duplicated: "e2classaverage.py ... --align=rotate_translate_iterative:maxiter=5:maxshift=10:maxshift=37"

How about the core dump at "fclose"? I google potential causes of that, but since fclose is in glibc, it appeared I would need a much better understanding of the code than I have, to even start thinking about that.

To add to my previous observations, on linux64, the STDERR output includes the following lines, perhaps suggesting why the stack trace doesn't have more EMAN functions in it:

close failed in file object destructor:

sys.excepthook is missing

lost sys.stderr

close failed in file object destructor:

sys.excepthook is missing

lost sys.stderr

close failed in file object destructor:

sys.excepthook is missing

lost sys.stderr

Thank you,

John

John Gallagher

unread,

Dec 11, 2016, 1:35:16 PM12/11/16

to EMAN2

I admit I didn't fully understand your suggestion earlier, but I removed the "refine" options for classralign and simralign, and I still get core files implicating fclose.

e2refine2d.py --input=sets/${SET_FILES[$SGE_TASK_ID]} --ncls=64 --fastseed --iter=5 --nbasisfp=64 --naliref=64 --parallel=thread:12 --center=xform.center --simalign=rotate_translate_flip_iterative --simaligncmp=ccc --simraligncmp=dot --simcmp=ccc --classkeep=0.85 --classiter=5 --classalign=rotate_translate_flip_iterative --classaligncmp=ccc --classraligncmp=ccc --classaverager=ctf.weight --classcmp=ccc --classnormproc=normalize.edgemean -v 2

$ gdb /hpcdata/lid3/eman2/EMAN2-2016-12-05/extlib/bin/python -c core.111817

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7

...

Reading symbols from /hpcdata/lid3/eman2/EMAN2-2016-12-05/extlib/bin/python2.7...done.

...

Using host libthread_db library "/lib64/libthread_db.so.1".

Core was generated by `/hpcdata/lid3/eman2/EMAN2-2016-12-05/extlib/bin/python /hpcdata/lid3/eman2/EMAN'.

Program terminated with signal 11, Segmentation fault.

...

(gdb) where

#0 0x00002aaaabbcc2c0 in _int_free () from /lib64/libc.so.6

#1 0x00002aaaabbb9ff5 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6

#2 0x000000000042b9ae in close_the_file (f=0x2aaac050fa50) at Objects/fileobject.c:456

#3 0x000000000042ba36 in file_close (f=0x2aaaabf0a760 <main_arena>) at Objects/fileobject.c:663

#4 0x000000000049d75a in call_function (oparg=<optimized out>, pp_stack=<optimized out>)

at Python/ceval.c:4005

#5 PyEval_EvalFrameEx (f=0x1d3cf70, throwflag=<optimized out>) at Python/ceval.c:2666

#6 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>)

at Python/ceval.c:4107

#7 PyEval_EvalFrameEx (f=0x88c2d0, throwflag=<optimized out>) at Python/ceval.c:2666

#8 0x000000000049db3f in call_function (oparg=<optimized out>, pp_stack=<optimized out>)

at Python/ceval.c:4107

#9 PyEval_EvalFrameEx (f=0x886aa0, throwflag=<optimized out>) at Python/ceval.c:2666

#10 0x000000000049edab in PyEval_EvalCodeEx (co=0x2aaaaab92b30, globals=<optimized out>,

locals=<optimized out>, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)

at Python/ceval.c:3253

#11 0x000000000049ee22 in PyEval_EvalCode (co=0x2aaaabf0a760 <main_arena>, globals=0x80,

locals=0xfffffffffffffdb0) at Python/ceval.c:667

#12 0x00000000004c0f81 in run_mod (arena=<optimized out>, flags=<optimized out>, locals=<optimized out>,

globals=<optimized out>, filename=<optimized out>, mod=<optimized out>) at Python/pythonrun.c:1353

#13 PyRun_FileExFlags (fp=0x8819f0,

filename=0x7fffffff86b9 "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2classaverage.py",

start=<optimized out>, globals=0x7e8e90, locals=0x7e8e90, closeit=1, flags=0x7fffffff7d60)

at Python/pythonrun.c:1339

#14 0x00000000004c1238 in PyRun_SimpleFileExFlags (fp=<optimized out>,

filename=0x7fffffff86b9 "/hpcdata/lid3/eman2/EMAN2-2016-12-05/bin/e2classaverage.py", closeit=1,

flags=0x7fffffff7d60) at Python/pythonrun.c:943

#15 0x0000000000414bbd in Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:639

#16 0x00002aaaabb70b15 in __libc_start_main () from /lib64/libc.so.6

#17 0x0000000000413d69 in _start ()

I'm not sure which log file corresponds to which core file, but they all die at e2classaverage.py (log output below):

Error running:

e2classaverage.py --input=sets/set07-split-001__ctf_flip_shrink2_radial46.lst --classmx=r2d_01/classmx_03.hdf --output=r2d_01/classes_03.hdf --force --center xform.center --iter=5 --align=rotate_translate_flip_iterative:maxshift=37 --averager=ctf.weight --keep=0.850000 --cmp=ccc --aligncmp=ccc --normproc=normalize.edgemean --parallel=thread:12

John Gallagher

unread,

Dec 11, 2016, 2:05:41 PM12/11/16

to EMAN2

And to follow up on trying "rotate_translate_flip" instead of "rotate_translate_flip_iterative", I still get core files implicating fclose from either e2classaverage.py, or e2simmx.py.

COMMAND:

e2refine2d.py --input=sets/${SET_FILES[$SGE_TASK_ID]} --ncls=64 --fastseed --iter=5 --nbasisfp=5 --naliref=16 --parallel=thread:12 --center=xform.center --simalign=rotate_translate_flip --simaligncmp=ccc --simraligncmp=dot --simcmp=ccc --classkeep=0.85 --classiter=5 --classalign=rotate_translate_flip --classaligncmp=ccc --classraligncmp=ccc --classaverager=ctf.weight --classcmp=ccc --classnormproc=normalize.edgemean -v 2

OUTPUT:

Error running:

e2simmx.py r2d_11/aliref_01.hdf sets/set07-split-011__ctf_flip_shrink2_radial46.lst r2d_11/simmx_01.hdf -f --saveali --cmp=ccc --align=rotate_translate_flip --aligncmp=ccc --verbose=1 --parallel=thread:12

OR:

Error running:

e2classaverage.py --input=sets/set07-split-001__ctf_flip_shrink2_radial46.lst --classmx=r2d_01/classmx_03.hdf --output=r2d_01/classes_03.hdf --force --center xform.center --iter=5 --align=rotate_translate_flip:maxshift=37 --averager=ctf.weight --keep=0.850000 --cmp=ccc --aligncmp=ccc --normproc=normalize.edgemean --parallel=thread:12

Steve Ludtke

unread,

Dec 12, 2016, 2:40:45 AM12/12/16

to em...@googlegroups.com

Hi John,

removing the refine alignment was designed to deal with:

#2 0x00002aaab873bc72 in gsl_error () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libgsl.so.0
#3 0x00002aaab87b31d1 in nmsimplex_set () from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/../extlib/lib/libgsl.so.0
#4 0x00002aaab78d9591 in EMAN::RefineAligner::align(EMAN::EMData*, EMAN::EMData*, std::string const&, EMAN::Dict const&) const ()
from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libEM2.so
#5 0x00002aaab770fd80 in EMAN::EMData::align(std::string const&, EMAN::EMData*, EMAN::Dict const&, std::string const&, EMAN::Dict const&) ()
from /hpcdata/lid3/eman2/EMAN2-2016-12-05/lib/libEM2.so

from the cluster. My hope was that the same was happening other places, but the error was showing up at a later point. Clearly if it's still crashing something else is going on. I don't see any problems with the particles themselves you sent separately. More ideas/questions:

- Is the box size for the particles a "good" number?

- Are you running this on a shared filesystem of some sort (clearly you are on the cluster, but I assume you're using MPI there...)?

- On the Mac is it running on an internal drive using the Mac filesystem?

Separate from the odd file close crash, the new method using e2evalrefine.py described in the current tutorial does a very good job at eliminating bad particles without so much subjective work.

It is still possible, however, to do a better job at identifying bad classes with e2refine2d. The default parameters are selected to make good class-averages in many different orientations for making initial models, not for separating out bad particles. These are two very different tasks. The biggest effect on this is the --normproj option. If specified, this tries to ignore differences in contrast among particles, one of the biggest differences for "bad" particles. Turning this option off can make a big difference if this is your goal. However, note that the classification done in e2refine2d is designed to find particles in the same orientation and ignore other differences. It is these other differences which are most important in pulling out ouliers (bad particles), so while it can remove things like bad ice and other forms of contamination, it isn't very good at finding general low-contrast particles.

John Gallagher

unread,

Dec 12, 2016, 7:00:38 PM12/12/16

to EMAN2

I seem to have found cause and effect with the crashes in my linux binary version of the e2refine2d.py.

In short: other libraries loaded in the environment were doing something bad for libc. I checked the version of libc (implicated by the core file) by running /lib64/libc.so.6, and it prints the version. But after loading libraries necessary for EMAN into my environment, running /lib64/libc.so.6 segment faults (not good). If I leave out the libraries that supply graphics (i.e. libGLU.so.1, and a lot of stuff that comes with it), then /lib64/libc.so.6 no longer crashes when printing the version, and e2refine2d.py no longer crashes at fclose in /lib64/libc.so.6. So I conclude those libraries external to EMAN were a problem, and we're still sorting out fixing that.

I still experience problems with e2refine2d.py in OSX failing to create files in the r2d_XX folders, but that's not necessarily a priority for me personally.

For completeness, here are answers to the questions posed above:

> - Is the box size for the particles a "good" number?

Yes. These are 112, and full size particles are 224.

> - Are you running this on a shared filesystem of some sort (clearly you are on the cluster, but I assume you're using MPI there...)?

The filesystem is gpfs. I'm not using MPI in this case, just threading.

> - On the Mac is it running on an internal drive using the Mac filesystem?

Yes. OSX 10.11.6. It's mostly vanilla. I have MacPorts also, but I don't load it in my path by default, or when using EMAN2.

> Separate from the odd file close crash, the new method using e2evalrefine.py described in the current tutorial does a very good job at eliminating bad particles without so much subjective work.

Sounds interesting about e2evalrefine.py. I wasn't able to spot that tutorial from the EMAN wiki tutorial page, where can I find it?

> The default parameters are selected to make good class-averages in many different orientations for making initial models, not for separating out bad particles. These are two very different tasks.

Agreed, my hope is to find parameters where bad particles are similar enough to each other that they cluster. Ironically, having more bad particles could be a good thing in that regard. I love e2evalparticles.py to evaluate the 2D classification results, since it makes it so quick to drill down into individual classes and see how heterogeneous the particles are.

Thank you,

John

John F

unread,

Dec 12, 2016, 8:09:20 PM12/12/16

to em...@googlegroups.com

On running:

e2refine2d.py --input=goodparticle_pred.mrc --iter=6 --naliref=7 --path=r2d_01 --ncls=20 -parallel=thread:32

I get error:

Error - the number of rows (6152) in the classification matrix image r2d_01/classmx_00.hdf does not match the number of images (1) in goodparticle_pred.mrc

My stack size is 6,152 particles whose size is 64x64. Are these images or images stack too small?

Steve Ludtke

unread,

Dec 12, 2016, 11:29:55 PM12/12/16

to em...@googlegroups.com

HI John. MRC stacks will be treated as volumes unless you use the .mrcs file extension.

Steve Ludtke

unread,

Dec 12, 2016, 11:57:18 PM12/12/16

to em...@googlegroups.com

On Dec 12, 2016, at 4:00 PM, John Gallagher <johnrober...@gmail.com> wrote:

I seem to have found cause and effect with the crashes in my linux binary version of the e2refine2d.py.

In short: other libraries loaded in the environment were doing something bad for libc. I checked the version of libc (implicated by the core file) by running /lib64/libc.so.6, and it prints the version. But after loading libraries necessary for EMAN into my environment, running /lib64/libc.so.6 segment faults (not good). If I leave out the libraries that supply graphics (i.e. libGLU.so.1, and a lot of stuff that comes with it), then /lib64/libc.so.6 no longer crashes when printing the version, and e2refine2d.py no longer crashes at fclose in /lib64/libc.so.6. So I conclude those libraries external to EMAN were a problem, and we're still sorting out fixing that.

Huh, interesting observation. We use Kubuntu primarily in-house and haven't had any problems with any version thru 16.10, so a bit of a mystery. If compiling from source solves it, then I probably wouldn't worry about digging much deeper. Source compiles are also much better optimized, so really no downside other than the pain of doing the compile.

I still experience problems with e2refine2d.py in OSX failing to create files in the r2d_XX folders, but that's not necessarily a priority for me personally.

Ok, I'm not sure what to say. Lots of OSX users (myself included) and I don't have this issue. May be tricky to figure out.

For completeness, here are answers to the questions posed above:

> - Is the box size for the particles a "good" number?
Yes. These are 112, and full size particles are 224.

ok

> - Are you running this on a shared filesystem of some sort (clearly you are on the cluster, but I assume you're using MPI there...)?
The filesystem is gpfs. I'm not using MPI in this case, just threading.

ok. As long as file-locking is implemented, it should be fine.

> - On the Mac is it running on an internal drive using the Mac filesystem?
Yes. OSX 10.11.6. It's mostly vanilla. I have MacPorts also, but I don't load it in my path by default, or when using EMAN2.

ok

> Separate from the odd file close crash, the new method using e2evalrefine.py described in the current tutorial does a very good job at eliminating bad particles without so much subjective work.

Sounds interesting about e2evalrefine.py. I wasn't able to spot that tutorial from the EMAN wiki tutorial page, where can I find it?

It's in the current version of the normal single particle reconstruction tutorial. That gets updated at least once a year, and there are sometimes some fairly major changes. We've really refined the process substantially over the last 12 months.

> The default parameters are selected to make good class-averages in many different orientations for making initial models, not for separating out bad particles. These are two very different tasks.

Agreed, my hope is to find parameters where bad particles are similar enough to each other that they cluster. Ironically, having more bad particles could be a good thing in that regard. I love e2evalparticles.py to evaluate the 2D classification results, since it makes it so quick to drill down into individual classes and see how heterogeneous the particles are.

Yes, I like that too. The only issue is that, by definition, bad particles aren't similar to anything, even each other, so the only way to cluster them is by finding some sort of metaparameter which they all have in common. ie- the classification methods used to find good particles in specific orientations aren't really designed to look for the sorts of features which could be used for classification. I'm not saying we couldn't find some parameters like this, just that the ones we have now probably aren't optimal.

John F

unread,

Dec 16, 2016, 12:49:08 AM12/16/16

to em...@googlegroups.com

Hi Steve

I renamed the mrc file to mrcs, but it still crashed. Then I ran e2proc2d.py to convert gooparticle_pred.mrcs to goodparticle_pred.hdf. e2refine2d.py ran as expected on the hdf file.

Thanks John

On Mon, Dec 12, 2016 at 8:29 PM, Steve Ludtke <slud...@gmail.com> wrote:

HI John. MRC stacks will be treated as volumes unless you use the .mrcs file extension.

On Dec 12, 2016, at 5:09 PM, John F <jffl...@gmail.com> wrote:

On running:

e2refine2d.py --input=goodparticle_pred.mrc --iter=6 --naliref=7 --path=r2d_01 --ncls=20 -parallel=thread:32

I get error:

Error - the number of rows (6152) in the classification matrix image r2d_01/classmx_00.hdf does not match the number of images (1) in goodparticle_pred.mrc

My stack size is 6,152 particles whose size is 64x64. Are these images or images stack too small?

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com

To unsubscribe from this group, send email to eman2+unsubscribe@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.

To unsubscribe from this group and stop receiving emails from it, send an email to eman2+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com

To unsubscribe from this group, send email to eman2+unsubscribe@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.

To unsubscribe from this group and stop receiving emails from it, send an email to eman2+unsubscribe@googlegroups.com.

Steve Ludtke

unread,

Dec 16, 2016, 10:19:30 AM12/16/16

to em...@googlegroups.com

Hi John,

yes, it is better to use HDF, but in this case I think MRCS should have solved the problem (unless it's looking for something funny in the header). I assume you're using a current snapshot version? Was the error the same after changing to .mrcs?

To unsubscribe from this group, send email to eman2+un...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.

To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

----------------------------------------------------------------------------

Steven Ludtke, Ph.D.

Professor, Dept of Biochemistry and Mol. Biol.         (www.bcm.edu/biochem)

Co-Director National Center For Macromolecular Imaging        (ncmi.bcm.edu)

Co-Director CIBR Center                          (www.bcm.edu/research/cibr)

Baylor College of Medicine                             

slu...@bcm.edu

Reply all

Reply to author

Forward