Strange behavior of EMAN2Ctf

7 views
Skip to first unread message

Thorsten Wagner

unread,
Feb 18, 2019, 8:20:47 AM2/18/19
to eman2-de...@googlegroups.com

Dear list,


I've found a strange behavior working with EMAN2Ctf.


If I try to load CTF parameters from a dictionary in a for loop it fails with the exception:


Traceback (most recent call last):
  File "reproduce.py", line 17, in <module>
    for value in range(3):
IndexError: list index out of range


This is my EMAN2 Version:

EMAN 2.21a final (GITHUB: 2018-02-14 08:53 - commit: None )
Your EMAN2 is running on: Linux-4.15.0-43-generic-x86_64-with-debian-buster-sid 4.15.0-43-generic
Your Python version is: 2.7.13



Here is the script to produce the problem:

import EMAN2

current_ctf = EMAN2.EMAN2Ctf()

# Runs
for value in range(1):
ctf_parameter = EMAN2.EMAN2Ctf(current_ctf)
# Do some stuff with ctf


# Runs
current_ctf_dict = current_ctf.to_dict()
ctf_parameter = EMAN2.EMAN2Ctf()
ctf_parameter.from_dict(current_ctf_dict)

# Fails
for value in range(1):
ctf_parameter = EMAN2.EMAN2Ctf()
ctf_parameter.from_dict(current_ctf_dict)

# Do some stuff with ctf


Best,

Thorsten

_____________________________

Dr. Thorsten Wagner

Max-Planck-Institute of Molecular Physiology

Structural Biochemistry

Otto-Hahn Strasse 11

D-44227 Dortmund

Phone +49-(0)231-133-2357

Thorsten Wagner

unread,
Feb 18, 2019, 8:27:38 AM2/18/19
to eman2-de...@googlegroups.com

Its getting even more strange.


If I use a numpy array instead of range in my loop, it works:

import EMAN2
import numpy as np

current_ctf = EMAN2.EMAN2Ctf()

# Runs
for value in range(1):
ctf_parameter = EMAN2.EMAN2Ctf(current_ctf)
# Do some stuff with ctf

# Runs
current_ctf_dict = current_ctf.to_dict()
ctf_parameter = EMAN2.EMAN2Ctf()
ctf_parameter.from_dict(current_ctf_dict)

# Runs
for value in np.array(range(1)):

ctf_parameter = EMAN2.EMAN2Ctf()
ctf_parameter.from_dict(current_ctf_dict)

# Fails
for value in range(1):
ctf_parameter = EMAN2.EMAN2Ctf()
ctf_parameter.from_dict(current_ctf_dict)


_____________________________

Dr. Thorsten Wagner

Max-Planck-Institute of Molecular Physiology

Structural Biochemistry

Otto-Hahn Strasse 11

D-44227 Dortmund

Phone +49-(0)231-133-2357


Von: eman2-de...@googlegroups.com <eman2-de...@googlegroups.com> im Auftrag von Thorsten Wagner <thorste...@mpi-dortmund.mpg.de>
Gesendet: Montag, 18. Februar 2019 14:20
An: eman2-de...@googlegroups.com
Betreff: [eman2-developers] Strange behavior of EMAN2Ctf
 
--
You received this message because you are subscribed to the Google Groups "EMAN2 and SPARX Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2-develope...@googlegroups.com.
To post to this group, send email to eman2-de...@googlegroups.com.
Visit this group at https://groups.google.com/group/eman2-developers.
For more options, visit https://groups.google.com/d/optout.

Steve Ludtke

unread,
Feb 18, 2019, 11:22:49 AM2/18/19
to eman2-de...@googlegroups.com
Hi Thorsten,
this seems to be a problem boost is having with initializing a C++ object from an empty Python list. Two of the elements in the dictionary form of the EMAN2Ctf object are lists of floats (background and snr). When you initialize a new object, these lists of floats are empty, so in python they aren't empty lists of floats, they are empty lists of no defined type. When you go to reinitialize a CTF object with these empty lists, it seems that Boost must have some internal bug related to this (or there is something in our code somewhere I don't see). I agree the resulting failure case is bizarre, but this version of the code does not crash:

import EMAN2

current_ctf = EMAN2.EMAN2Ctf()

# Runs
for value in range(1):
    ctf_parameter = EMAN2.EMAN2Ctf(current_ctf)
    # Do some stuff with ctf


# Runs
current_ctf_dict = current_ctf.to_dict()
ctf_parameter = EMAN2.EMAN2Ctf()
current_ctf_dict["background"]=[1.,2.]
current_ctf_dict["snr"]=[1.,2.]
#print current_ctf_dict
ctf_parameter.from_dict(current_ctf_dict)

# Fails
for i in range(1):
    ctf_parameter = EMAN2.EMAN2Ctf()
    ctf_parameter.from_dict(current_ctf_dict)

    # Do some stuff with ctf


--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine 
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr)


Thorsten Wagner

unread,
Feb 18, 2019, 12:06:58 PM2/18/19
to eman2-de...@googlegroups.com
Thanks a for your insights!

But do you have an idea why the error depends on looping over an numpy array (it works) or over a range (fails)? The loop body even does not depend on the loop value...

Best
Thorsten

_____________________________
Dr. Thorsten Wagner
Max-Planck-Institute of Molecular Physiology
Structural Biochemistry
Otto-Hahn Strasse 11
D-44227 Dortmund
Phone +49-(0)231-133-2357

________________________________________
Von: eman2-de...@googlegroups.com [eman2-de...@googlegroups.com] im Auftrag von Steve Ludtke [slud...@gmail.com]
Gesendet: Montag, 18. Februar 2019 17:22
An: eman2-de...@googlegroups.com
Betreff: Re: [eman2-developers] AW: Strange behavior of EMAN2Ctf

Hi Thorsten,
this seems to be a problem boost is having with initializing a C++ object from an empty Python list. Two of the elements in the dictionary form of the EMAN2Ctf object are lists of floats (background and snr). When you initialize a new object, these lists of floats are empty, so in python they aren't empty lists of floats, they are empty lists of no defined type. When you go to reinitialize a CTF object with these empty lists, it seems that Boost must have some internal bug related to this (or there is something in our code somewhere I don't see). I agree the resulting failure case is bizarre, but this version of the code does not crash:

import EMAN2

current_ctf = EMAN2.EMAN2Ctf()

# Runs
for value in range(1):
ctf_parameter = EMAN2.EMAN2Ctf(current_ctf)
# Do some stuff with ctf


# Runs
current_ctf_dict = current_ctf.to_dict()
ctf_parameter = EMAN2.EMAN2Ctf()
current_ctf_dict["background"]=[1.,2.]
current_ctf_dict["snr"]=[1.,2.]
#print current_ctf_dict
ctf_parameter.from_dict(current_ctf_dict)

# Fails
for i in range(1):
ctf_parameter = EMAN2.EMAN2Ctf()
ctf_parameter.from_dict(current_ctf_dict)

# Do some stuff with ctf


--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slu...@bcm.edu<mailto:slu...@bcm.edu>> Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology (www.bcm.edu/biochem<http://www.bcm.edu/biochem>)
Academic Director, CryoEM Core (cryoem.bcm.edu<http://cryoem.bcm.edu>)
Co-Director CIBR Center (www.bcm.edu/research/cibr<http://www.bcm.edu/research/cibr>)
Von: eman2-de...@googlegroups.com<mailto:eman2-de...@googlegroups.com> <eman2-de...@googlegroups.com<mailto:eman2-de...@googlegroups.com>> im Auftrag von Thorsten Wagner <thorste...@mpi-dortmund.mpg.de<mailto:thorste...@mpi-dortmund.mpg.de>>
Gesendet: Montag, 18. Februar 2019 14:20
An: eman2-de...@googlegroups.com<mailto:eman2-de...@googlegroups.com>
To unsubscribe from this group and stop receiving emails from it, send an email to eman2-develope...@googlegroups.com<mailto:eman2-develope...@googlegroups.com>.
To post to this group, send email to eman2-de...@googlegroups.com<mailto:eman2-de...@googlegroups.com>.
Visit this group at https://groups.google.com/group/eman2-developers.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "EMAN2 and SPARX Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2-develope...@googlegroups.com<mailto:eman2-develope...@googlegroups.com>.
To post to this group, send email to eman2-de...@googlegroups.com<mailto:eman2-de...@googlegroups.com>.
Visit this group at https://groups.google.com/group/eman2-developers.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "EMAN2 and SPARX Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2-develope...@googlegroups.com<mailto:eman2-develope...@googlegroups.com>.
To post to this group, send email to eman2-de...@googlegroups.com<mailto:eman2-de...@googlegroups.com>.

Steve Ludtke

unread,
Feb 18, 2019, 2:40:59 PM2/18/19
to eman2-de...@googlegroups.com
I believe the attempt to convert the empty list into a C++ float array is corrupting the Python kernel in some fashion. I don't think this is likely an issue that we can easily resolve, as it may well be a problem in Boost itself. The solution is to not work with EMAN2Ctf objects in this specific way without assigning at least one value to the SNR and background arrays. I suspect this is related to this one very specific operation where you are trying to repopulate an EMAN2Ctf object from a dictionary with these empty lists. 

In general this is a pretty strange operation. The sequence ->

Make EMAN2Ctf
EMAN2Ctf -> dict
manipulate dict
dict -> EMAN2Ctf

is a very strange an inefficient construct.  Why aren't you just manipulating the EMAN2Ctf object directly instead of going through this dictionary intermediate? This approach makes very little sense.


--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine 
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr)

To unsubscribe from this group and stop receiving emails from it, send an email to eman2-develope...@googlegroups.com.
To post to this group, send email to eman2-de...@googlegroups.com.

Paul Penczek

unread,
Feb 18, 2019, 2:46:42 PM2/18/19
to eman2-de...@googlegroups.com
Hi,

he might have been inspired by older sparx code, which predates ctf object. I never really bothered to
convert everything to the object use.

____________________
Pawel A. Penczek, Ph.D.
Professor
Structural Biology Imaging Center, co-Director
The University of Texas
phone: 713-500-5416
fax: 713-500-0652
https://med.uth.edu/bmb/faculty/pawel-a-penczek/

Markus Stabrin

unread,
Feb 20, 2019, 6:10:14 PM2/20/19
to eman2-de...@googlegroups.com
Hello everybody,

We encountered some Segmentation Faults during our 3D refinement and tracked down the problem so I can provide a simple code example that is quite close to the real life case:

from EMAN2 import *

a = EMData(30000, 80000, 1)
b = EMData(30000, 1)

for i in range(80000):
print(i)
a.insert_clip(b, (0, i))

The problem is that the code on multiple machines crashes at loop number 71583 with a segmentation fault.

The system memory is constant at 10 Gb and not close to the maximum.

Any ideas why this could happen?
Is there some sort of internal memory allocation problem with large EMData objects?

Thank you in advance!
Best,
Markus

Markus Stabrin

unread,
Feb 20, 2019, 6:17:21 PM2/20/19
to "eman2-developers@googlegroups.com"
Hello everybody,

as a quick followup: The loop is not necessary, every index greater equals 71583 will fail.

from EMAN2 import *

a = EMData(30000, 80000, 1)
b = EMData(30000, 1)

a.insert_clip(b, (0, 71583))

Best,
Markus
________________________________________
Von: Markus Stabrin
Gesendet: Donnerstag, 21. Februar 2019 00:10
An: eman2-de...@googlegroups.com
Betreff: Memory issue with large EMData objects?

Paul Penczek

unread,
Feb 20, 2019, 6:17:27 PM2/20/19
to eman2-de...@googlegroups.com
Hi

Are you sure you got the number of zeroes right?

Regards,
Pawel

Markus Stabrin

unread,
Feb 20, 2019, 6:24:11 PM2/20/19
to eman2-de...@googlegroups.com
Hello Pawel,

Yes.
The real life test has a shape of 29768, 158134.
But it did not fit into my laptops memory and as it fails at about 71000 anyways I reduced the dimensions to 80000 (so half of the values we encountered) for this test case.

Best,
Markus
________________________________________
Von: eman2-de...@googlegroups.com <eman2-de...@googlegroups.com> im Auftrag von Paul Penczek <pawe...@att.net>
Gesendet: Donnerstag, 21. Februar 2019 00:17
An: eman2-de...@googlegroups.com
Betreff: Re: [eman2-developers] Memory issue with large EMData objects?

Paul Penczek

unread,
Feb 20, 2019, 6:29:16 PM2/20/19
to eman2-de...@googlegroups.com
Most likely somewhere on C level index is short integer. You can easily look it up and change to long unsigned integer.

Regards,
Pawel

Steve Ludtke

unread,
Feb 20, 2019, 6:41:30 PM2/20/19
to eman2-de...@googlegroups.com
It appears that in the current code, the product xsize * ysize must be less than 2**31 (2 billion). While objects larger than 8 G are supported, at the moment they are only supported if some portion of that size is in Z. 

The issue is in emdata.h line 843:

                int nx, ny, nz, nxy;
                size_t nxyz;

you can see that nx*ny*nz is a 64 bit int, but nx*ny is only a 32 bit (signed) int. It might be that shifting nxy to the next line and cleaning up a couple of nxy = ...  lines would solve the problem.

--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine 
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr)


Reply all
Reply to author
Forward
0 new messages