Issue 173 in mdanalysis: cannot pickle universe object

31 views
Skip to first unread message

mdana...@googlecode.com

unread,
Mar 27, 2014, 5:18:59 PM3/27/14
to mdnalys...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 173 by charlesz...@gmail.com: cannot pickle universe object
http://code.google.com/p/mdanalysis/issues/detail?id=173

What steps will reproduce the problem?
1. Create a universe object by passing a psf file and a dcd file
u = universe(psf, dcd)
2. Try to serialize this object by using python pickle module
u_blob = pickle.dumps(u)

What is the expected output? What do you see instead?
Expected output should be a byte stream, but I got error
message 'TypeError: 'AtomGroup' object is not callable'

What version of the product are you using? On what operating system?

MDAnalysis 0.8.0
Python 2.7.6 / Python 2.6.2
Arch Linux / CentOS

Please provide any additional information below.



--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

mdana...@googlecode.com

unread,
Mar 27, 2014, 6:14:57 PM3/27/14
to mdnalys...@googlegroups.com

Comment #1 on issue 173 by tyler.j...@gmail.com: cannot pickle universe
object
http://code.google.com/p/mdanalysis/issues/detail?id=173

I've certainly brought this kind of thing up in the past, but I think the
fact that universe objects are closely tied to open files makes it
difficult to allow proper serialization via pickle. This also makes it
challenging to pass universe objects between cores in a parallel workflow.
I generally end up adjusting my workflow to produce something more
tractable like a numpy array of coordinates and pickle that for storage and
/ or interprocess communication.

mdana...@googlecode.com

unread,
Mar 27, 2014, 6:16:29 PM3/27/14
to mdnalys...@googlegroups.com

Comment #2 on issue 173 by orbeckst: cannot pickle universe object
http://code.google.com/p/mdanalysis/issues/detail?id=173

The way Universes are built at the moment makes it impossible to pickle
them as they contain trajectory reader objects which in turn contain open
file descriptors.

This won't change any time soon unless someone comes up with a smart way to
do this. Therefore I am closing this with 'WontFix' - but feel free to
start a discussion on the developer mailing list or in the comments to this
issue. If a sensible approach and consensus emerges we will reopen.

Oliver

mdana...@googlecode.com

unread,
Mar 27, 2014, 6:17:29 PM3/27/14
to mdnalys...@googlegroups.com
Updates:
Status: WontFix

Comment #3 on issue 173 by orbeckst: cannot pickle universe object
http://code.google.com/p/mdanalysis/issues/detail?id=173

(No comment was entered for this change.)

mdana...@googlecode.com

unread,
Mar 27, 2014, 6:29:20 PM3/27/14
to mdnalys...@googlegroups.com

Comment #4 on issue 173 by manuel.n...@gmail.com: cannot pickle universe
object
http://code.google.com/p/mdanalysis/issues/detail?id=173

Just to clarify the cryptic error message:

pickle checks whether a class has a __getstate__ function, and then
executes it if it does. The way this is done is a sort of duck-typing,
where object.__getstate__ is assigned to a variable, which is subsequently
called.
If there is no __getstate__ an AttributeError is raised, in which case
pickle's default behavior ensues.

The thing with some MDAnalysis onjects is that they manage their attributes
and never raise an AttributeError. In particular if you try
AtomGroup.__getstate__ you get a SelectionError which pickle does not
handle; if you do the same with Segment.__getstate__ you get no error (!!)
and an empty AtomGroup is returned. It is this last case that causes
the 'TypeError: 'AtomGroup' object is not callable' when pickle tries to
execute what it got for Segment.__getstate__.

This is a dangerous side-effect of the syntactic sugar for selection
shortcuts. A lot of things can go silently ignored.

On the topic of parallelization, I'll post soon some code I have geared
specifically for parallelizing trajectory reads. The serialization of
MDAnalysis objects only becomes a problem if they are to be passed back and
forth between workers. If the code avoids that (and takes care of renewing
file descriptors) multiprocessing works fine.

mdana...@googlecode.com

unread,
Mar 27, 2014, 7:18:37 PM3/27/14
to mdnalys...@googlegroups.com

Comment #5 on issue 173 by charlesz...@gmail.com: cannot pickle universe
object
http://code.google.com/p/mdanalysis/issues/detail?id=173

Is that possible to extract all the fields of the universe object, and
serialize them to byte stream. When we wanna regenerate this object, we
deserialize all the fields and construct the object.

mdana...@googlecode.com

unread,
Mar 27, 2014, 7:35:01 PM3/27/14
to mdnalys...@googlegroups.com

Comment #6 on issue 173 by manuel.n...@gmail.com: cannot pickle universe
object
http://code.google.com/p/mdanalysis/issues/detail?id=173

That is pickle's default approach. MDAnalysis universes are unpicklable in
this way due to, among other things, open file descriptors and unpicklable
function objects (SWIG stuff).

This can all be managed instead of leaving pickle to do its default
process. Look into the __getstate__ and __setstate__ functions. There
probably is a way of coercing the serialization of enough information for
__setstate__ to recreate the universe. As far as I went it looked messy
and, in my case, not worth the trouble.

mdana...@googlecode.com

unread,
Mar 27, 2014, 7:42:33 PM3/27/14
to mdnalys...@googlegroups.com

Comment #7 on issue 173 by orbeckst: cannot pickle universe object
http://code.google.com/p/mdanalysis/issues/detail?id=173

For __getstate__():
* all the arguments of Universe()
* current frame in the trajectory (often that will just be frame 1)
* optional: index of the XTC/TRR reader

For __setstate__():
1) build the Universe again
2) optional: recreate the XTC/TRR reader index
3) go to the saved frame

mdana...@googlecode.com

unread,
Apr 15, 2014, 4:29:48 AM4/15/14
to mdnalys...@googlegroups.com
Updates:
Status: Accepted
Labels: -Type-Defect Type-Enhancement

Comment #9 on issue 173 by orbeckst: cannot pickle universe object
http://code.google.com/p/mdanalysis/issues/detail?id=173

Maybe it would be worthwhile making pickling work along the lines of
recreating a copy of the universe using the constructor information and
information from the TrajectoryReader (which would need its own
__getstate__/__setstate__) -- see comments in this issue for more details.

I reopen as an enhancement and anyone interested can grab the ticket.

mdana...@googlecode.com

unread,
Apr 15, 2014, 2:54:37 PM4/15/14
to mdnalys...@googlegroups.com

Comment #10 on issue 173 by charlesz...@gmail.com: cannot pickle universe
object
http://code.google.com/p/mdanalysis/issues/detail?id=173

There is a _dcd_c_ptr attribute inside the DCDReader object. It is a C
object defined in dcd.c, if we wanna serialize the DCDReader object, we
have to serialize this c pointer as well. But it seems like pickle cannot
dumps a pyobject.

mdana...@googlecode.com

unread,
Apr 15, 2014, 3:08:17 PM4/15/14
to mdnalys...@googlegroups.com

Comment #11 on issue 173 by orbeckst: cannot pickle universe object
http://code.google.com/p/mdanalysis/issues/detail?id=173

I don't think that we should serialize a Reader wholesale but instead
provide a way to re-instantiate. In this way you only serialize state
information such as

- filename
- number of atoms
- current frame
- ... all other parameters set in __init__()

and then essentially create a brand new Reader with this information.

mdana...@googlecode.com

unread,
Apr 15, 2014, 3:54:10 PM4/15/14
to mdnalys...@googlegroups.com

Comment #12 on issue 173 by charlesz...@gmail.com: cannot pickle universe
object
http://code.google.com/p/mdanalysis/issues/detail?id=173

If we wanna initialize a reader object, we must call the _read_dcd_header()
inside the initialize function which is implemented by c in dcd.c and it
will set a new attribute named __dcd_c_ptr which is a PyObject in the
reader object.

This attribute is used by function like MDAnalysis.analysis.align.rmsd(A,
B) and most other functions. So I think if we wanna serialize DCDReader we
will have to serialize this attribute or we can just serialize the
coordinate and topology file for regenerating universe object later.
Reply all
Reply to author
Forward
0 new messages