Creating a 2D image stack in Python

1,803 views
Skip to first unread message

Mario J. Borgnia

unread,
Feb 20, 2013, 2:58:41 PM2/20/13
to em...@googlegroups.com
Hi Steve,
I am trying to generate a EMData object that contains several slices. I
followed the instructions in:

http://blake.bcm.edu/emanwiki/EMAN2MakingVolumes

and wrote the attached program. The files generated are interpreted by
e2display as 2D (one_by_one.hdf) and 3D (all_at_once.hdf) respectively.
Is there a better/proper way of doing this?

Thanks!!

CODE:

#!/usr/local/EMAN2_daily_20130122/Python/bin/python
from EMAN2 import *
import sys
def main():
tmp=EMData()
e=EMData()
for image_file in sys.argv[1:]:
print "Este %d" % e.get_zsize()
if e.get_xsize() == 0:
e.read_image(image_file)
e.append_image("one_by_one.hdf",EMUtil.ImageType.IMAGE_HDF,False)
else:
e.set_size(e.get_xsize(),e.get_ysize(),e.get_zsize()+1)
tmp.read_image(image_file)
tmp.append_image("one_by_one.hdf",EMUtil.ImageType.IMAGE_HDF,False)
e.insert_clip(tmp,(0,0,e.get_zsize()-1))



display(e)
e.write_image("all_at_once.hdf",0,EMUtil.ImageType.IMAGE_HDF,False)



if __name__=="__main__":
main()



example.py

Ludtke Steven

unread,
Feb 20, 2013, 4:26:17 PM2/20/13
to em...@googlegroups.com
Hi Mario. There are a few little problems with your script below. Can you explain exactly what you're trying to do (I don't know what sorts of input files you're using) ? I can probably offer some better ideas...
> --
> --
> ----------------------------------------------------------------------------------------------
> You received this message because you are subscribed to the Google
> Groups "EMAN2" group.
> To post to this group, send email to em...@googlegroups.com
> To unsubscribe from this group, send email to eman2+un...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/eman2
>
> --- You received this message because you are subscribed to the Google Groups "EMAN2" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
> <example.py>

Mario J. Borgnia

unread,
Feb 20, 2013, 4:43:24 PM2/20/13
to em...@googlegroups.com
I am reading in 2D mrc files and writing out stacks of them. I could do
this using e2proc2d.py but I may want to process the images in between
and would like to learn how to do it in memory to save I/O operations.

So, basically the task is to create a single 2D stack from an arbitrary
number of individual images. Leaving the option of processing each 2D
section in memory.

Steven Ludtke

unread,
Feb 20, 2013, 11:33:46 PM2/20/13
to em...@googlegroups.com
Hi Mario. With 'proper' stack files (files which distinguish between sets of 2-D images and 3-D volumes, and have per-image headers, ie - every single format other than MRC), you can write to arbitrary positions in the file. However, when stacking 2-D particles into a 3-D volume, the volume needs to have the correct dimensions before you insert.

>>>> #!/usr/local/EMAN2_daily_20130122/Python/bin/python
>>>> from EMAN2 import *
>>>> import sys
>>>> def main():
>>>> tmp=EMData()
>>>> e=EMData()
>>>> for image_file in sys.argv[1:]:
>>>> print "Este %d" % e.get_zsize()
>>>> if e.get_xsize() == 0:
>>>> e.read_image(image_file)
>>>> e.append_image("one_by_one.hdf",EMUtil.ImageType.IMAGE_HDF,False)
>>>> else:
>>>> e.set_size(e.get_xsize(),e.get_ysize(),e.get_zsize()+1)
>>>> tmp.read_image(image_file)
>>>> tmp.append_image("one_by_one.hdf",EMUtil.ImageType.IMAGE_HDF,False)
>>>> e.insert_clip(tmp,(0,0,e.get_zsize()-1))
>>>>
>>>>
>>>>
>>>> display(e)
>>>> e.write_image("all_at_once.hdf",0,EMUtil.ImageType.IMAGE_HDF,False)
>>>>
>>>>
>>>>
>>>> if __name__=="__main__":
>>>> main()


This is a bit overcomplicated. Let's start with your code, and make it simpler and create a volume stack in RAM:

#!/usr/local/EMAN2_daily_20130122/Python/bin/python
from EMAN2 import *
import sys
def main():
# get the size of the first input (assume the rest are the same size)
hdr=EMData(sys.argv[1:])
nx=hdr["nx"]
ny=hdr["ny"]

# create an empty output volume
outvol=EMData(nx,ny,len(sys.argv)-1)

# loop over the input files (assume each file has 1 image)
for z,image_file in enumerate(sys.argv[1:]):
im=EMData(image_file)
#im.process_inplace(....) # if you want to do something to the image
outvol.insert_clip(im,(0,0,z))

display(outvol)
outvol.write("output.mrc")

if __name__=="__main__":
main()

==============
Now, you could do the same thing by writing to the output file slice by slice. This would appear to avoid the problem of keeping the entire stack in memory while you construct it, however, it doesn't entirely. The trick is that the MRC output volume must be big enough to contain the full stack before you write to it. It will not get bigger dynamically as you insert slices. In essence, this means you have to create an empty volume of the appropriate size in RAM then write it to disk at the beginning. We should have a better solution for this, but frankly until very recently (tomography in IMOD), we have avoided MRC 'stack' files like the plague. They are poorly standardized (no good way to detect a 'stack' vs a 'volume', non-standard 'extended headers' abound, ...), don't have ANY per-image header data, etc. We support them, like we support everything else, but don't like them much.

Note that for any other EMAN supported cryoEM format this would become, the much simpler:

#!/usr/local/EMAN2_daily_20130122/Python/bin/python
from EMAN2 import *
import sys
def main():

for image_file in sys.argv[1:]:
im=EMData(image_file)
#im.process_inplace(....) # if you want to do something to the image
im.write_image("outfile.spi",-1) # -1 appends


if __name__=="__main__":
main()


----------------------------------------------------------------------------
Steven Ludtke, Ph.D.
Professor, Dept of Biochemistry and Mol. Biol. (www.bcm.edu/biochem)
Co-Director National Center For Macromolecular Imaging (ncmi.bcm.edu)
Co-Director CIBR Center (www.bcm.edu/research/cibr)
Baylor College of Medicine
slu...@bcm.edu

Mario J. Borgnia

unread,
Feb 21, 2013, 11:33:38 AM2/21/13
to em...@googlegroups.com
Steve,
Thank you for the detailed response!

I still do not understand if there is a difference between a EMData
object holding a stack of 2D images and one holding a 3D volume.
In my example, I am writing two files in HDF format, the
"one_by_one.hdf" by appending each image as it is read and processed and
the "all_at_once.hdf", by writing the whole EMData object at once at the
end of the of the process. As I write this message, I guess I am
begining to understand that the problem is not the difference in the
'objects' while in memory but in the way the stack is written into the
HDF file. The way I am writing it at the end is creating a 'stack' of 3D
images. Is there a flag in the 'write_image' method to indicate that I
am dumping a stack of 2D images rather than writing one element of a
stack of 3D objects.

Finally, is there a better place for this type of discussion? Maybe the
wiki?


THANKS AGAIN!!!


Mario


But, wait, I get it as I write...

Ludtke Steven

unread,
Feb 21, 2013, 11:58:24 AM2/21/13
to em...@googlegroups.com
Hi Mario. The mailing list is the right place for the discussion, I'd say.

EMData objects are intrinsically 3-D, but the size of one or more of the dimensions can be 1 pixel, effectively making it a 2-D or a 1-D object. Some methods and functions work only on objects that have a particular dimensionality, but fundamentally the EMData object is not different for the different numbers of dimensions.

The ONLY file format where it makes sense to stack 2-D particles into a 3-D stack is MRC, unless you are trying to do some sort of weird pseudo-3d visualization or something. The normal way of writing to multiple-image files is
im.write_image("file.hdf",n)

where n is the number of the image in the file (starting with 0). If n==-1, it will append to the end of the file. Each image in such files (HDF, BDB, IMAGIC, SPIDER) has its own header associated with it, and is truly an independent image.

Since MRC doesn't differentiate (in any standard way) between 3-D volumes and 2-D stacks, we cannot use this mechanism for reading and writing to MRC stacks. When EMAN sees a 3-D MRC file, it treats it as a 3-D volume, in all cases. From EMAN2's perspective the MRC format can contain only a single image, and that image can be 1-D, 2-D or 3-D. Stack files look like volumes. To extract a single image, you read a 2-D Region() from the 3-D image.

im.read_image("file.mrc",0,0,Region(0,0,10,256,256,1))

will read a 256x256 2-D image from the 3-D volume in file.mrc at Z=10 (the 11th particle in the stack). The Region specifies (x0,y0,z0,xsize,ysize,zsize). The first 0 in read_image is specifying the first image in the MRC file (there can only be 1 anyway), and the second 0 says not to read only the header (ie- read the image AND the header).

Grigory Sharov

unread,
Dec 26, 2014, 11:12:56 AM12/26/14
to eman2
Hello Steve,

I'm writing a similar python script and also have some problems. I have 2 mrc files: first is single 2D image, second is a stack of  seven 2D images (treated as volume in EMAN). I want to do some processing on both files and append first 2D image to the second stack (in last position). Input files should not be changed, final output should be preferably in mrcs format.

Here is a draft I came up with:

#!/home/sharov/soft/EMAN2.1/extlib/bin/python

from EMAN2 import *
import sys
def main():
   # get the size
   exp=EMData("single.mrc")
   inputstack=EMData("stack7.mrc")
   nx=exp["nx"]
   ny=exp["ny"]
   # create an empty output mrc stack for 8 images
   outvol=EMData(nx,ny,8)
   # process first 2D input image
   exp.process_inplace("threshold.clampminmax.nsigma",{"nsigma":4})
   exp.process_inplace("normalize.edgemean")
   exp.mult(-1)
   # process input stack as 3D volume
   inputstack.process_inplace("threshold.clampminmax.nsigma",{"nsigma":4})
   inputstack.process_inplace("normalize.edgemean")
   inputstack.mult(-1)
   #outvol.insert_clip(im,(0,0,0-6))
   
   # insert input 2D image as a last slice
   outvol.insert_clip(exp,(0,0,7))
   # write output stack
   outvol.write_image("outputstack.mrcs")
if __name__=="__main__":
   main()

The problem is I don't know how to insert processed input stack into output file, which already exists in memory. Insert_clip obviously doesn't accept range of values (see the commented line)..

Could you please explain how I should do this?

Best regards,
Grigory Sharov

Institute of Genetics and Molecular and Cellular Biology
Department of Structural Biology and Genomics
1, rue Laurent Fries
67404 Illkirch, France
tel. 03 69 48 51 00 
e-mail: sha...@igbmc.fr

Paul Penczek

unread,
Dec 26, 2014, 11:18:57 AM12/26/14
to em...@googlegroups.com
While I should probably not talk for Steve, all that you should be able to accomplish with e2proc2d.py, no need for Python coding.

Regards,
Pawel
(Typing while walking, hence weird substitutions)
For more options, visit https://groups.google.com/d/optout.

Grigory Sharov

unread,
Dec 26, 2014, 11:23:42 AM12/26/14
to eman2
Dear Paul,

I've already done this with e2proc2d, but I have thousands of files and stacks to process, so I'd like to avoid unnecessary I/O operations by doing everything in RAM and writing out only output files.

Best regards,
Grigory Sharov

Institute of Genetics and Molecular and Cellular Biology
Department of Structural Biology and Genomics
1, rue Laurent Fries
67404 Illkirch, France
tel. 03 69 48 51 00 
e-mail: sha...@igbmc.fr


Steven Ludtke

unread,
Dec 26, 2014, 7:11:12 PM12/26/14
to em...@googlegroups.com
Hi Grigory,
What you are trying to do would, indeed, be a bit inefficient with e2proc2d. What you have is a very good effort, but I think you may need a little conceptual clarification in EMAN2 conventions. This is in no way a criticism, as there is probably no good way for you to get this knowledge without doing exactly what you've done, ie - make a first attempt, and let someone in my group make suggestions.

Unlike the !$##@$ MRC format, EMAN2 (and every other file format) distinguishes between stacks of 2-D images and Volumes. Internally this is also true. If you read a stack of 2-D images as a 3-D volume, then apply a Processor, the image processing operation will occur in a 3-D not a 2-D context. There are some unusual cases where this may be desirable, but normally is isn't what you're after. Normally in EMAN2, a set of 2-D images would be stored as independent EMData objects in a Python List or Tuple, and you would not append an image to a stack by inserting data into a Volume. 3-D volumes are not a natural way to handle sets of 2-D images.

This leads to the ambiguity in what you are trying to accomplish, and why I think you may not be doing what you intend:
exp.process_inplace("threshold.clampminmax.nsigma",{"nsigma":4})
will apply the threshold operation to the stack of images considered as if they were a single volume. That is, it will determine the sigma for the entire stack, not each image individually. This:
   inputstack.process_inplace("normalize.edgemean")
will be even worse, as it will treat the first and last images in the stack as part of the 'edge' of a 3-D volume and try to use them to partially establish the mean value. I can't imagine this is what you had in mind.

Anyway, we strongly encourage anyone using 2.1release or newer to use the .mrcs extension for their stack files, as the extension will tell EMAN2.1 that you intend to interpret this image file in a 2-D not a 3-D context. So, let me take a stab at rewriting the core of your script, and see if this helps clarify. Note that this will not interact identically with your data due to the comments above and related issues:

# You could use this, and if you need to do something with all of the images at once, it may make sense, but in your case,
# it makes more sense to read one image at a time
   inputstack=EMData.read_images("stack7.mrcs") # will read all images as a Python List of EMData objects

#### actual new code

   exp=EMData("single.mrc")
   n=EMUtil.get_image_count("stack7.mrcs") # you could just say n=7 if you are sure...

   nx=exp["nx"]
   ny=exp["ny"]

   for i in xrange(n):
      img=EMData("stack7.mrcs",i)                     # reads image number i from the stack (first image is 0)
      img.process_inplace("threshold.clampminmax.nsigma",{"nsigma":4})
      img.process_inplace("normalize.edgemean")
      img.mult(-1)
      img.write_image("outputstack.mrcs",i)

   # process first 2D input image
   exp.process_inplace("threshold.clampminmax.nsigma",{"nsigma":4})
   exp.process_inplace("normalize.edgemean")
   exp.mult(-1)

   exp.write_image("outputstack.mrcs",-1)    # if you use image number -1 in write_image, it appends   

Please feel free to ask followup questions...


For more options, visit https://groups.google.com/d/optout.

Grigory Sharov

unread,
Dec 27, 2014, 1:15:54 PM12/27/14
to eman2
Hi Steve,

thanks a lot for such a nice explanation. Given the fact that I don't know anything about python, it's very helpful!

Though, I'm still a little confused: in the beginning of this topic, in the script you wrote for Mario, you create an empty volume in RAM and filling it with 2D slices. In my case, you propose to append 2D images to existing file. I thought I was doing similar thing - putting 2D slices wherever they come from into mrc volume (in RAM) and then writing out the output..

Anyway, my concern is the time of script execution and decreasing of I/O operations. Do you think that processing one by one would be faster than processing the full Python List of EMData objects?

Best regards,
Grigory Sharov

Institute of Genetics and Molecular and Cellular Biology
Department of Structural Biology and Genomics
1, rue Laurent Fries
67404 Illkirch, France
tel. 03 69 48 51 00 
e-mail: sha...@igbmc.fr


Steven Ludtke

unread,
Dec 27, 2014, 4:26:59 PM12/27/14
to em...@googlegroups.com
Modern operating systems have extensive caching capabilities built-in, so multiple accesses to the same file over a relatively short period of time will not entail much of a speed penalty over opening the file once and reading the whole thing. The case where you might see a speed difference is where you are comparing:

read images
process #1
write images
read images
process #2
write images

vs 
read images
process #1
process #2
write images

If your operations can be combined into a single e2proc2d command, then there is little reason to write your own script if speed is the question, eg:

e2proc2d.py stack7.mrcs output.mrcs --process threshold.clampminmax.nsigma:nsigma=4 --process normalize.edgemean --mult -1 
e2proc2d.py ref.mrcs output.mrcs --process threshold.clampminmax.nsigma:nsigma=4 --process normalize.edgemean --mult -1

would be nearly as efficient as what I wrote. There is a little overhead in the initialization of e2proc2d, which would probably be the biggest difference in a case like this.  The other reason to script things in python might be if you want to run the program on many different files and automate the naming of the files in some way. While that could also be done with shell scripting and e2proc2d, writing it in Python would save a lot of initialization time, and in a situation like this, would probably run noticeably faster, unless the individual images are very large.
Reply all
Reply to author
Forward
0 new messages