request for discussion: changing parts of AtomGroup API

53 views
Skip to first unread message

Oliver Beckstein

unread,
Jul 19, 2013, 3:47:33 PM7/19/13
to mdnalys...@googlegroups.com
Hi,

I'd like to propose some fairly severe changes to AtomGroup and are asking for a discussion amongst developers and then later (if the proposal survives the developer list) on the user list.

PROPOSAL

1) AtomGroup methods/properties should return NumPy arrays when these can be of simple numeric or string types. 

E.g. ag.resids() would return an array([1,2,3], dtype="int") and ag.resnames() would return something like array(['MET ,'ARG', 'THR'], dtype="|S3"). 

This would make it easier to use numpy-style slicing and generally fits the theme of integrating MD trajectories tightly with numpy. 

Groups of specific objects such as a bunch of residues are returned as a ResidueGroup or segments as a SegmentGroup (I added a patch to that effect the other day.)

It is unlikely to break code because arrays pretty much behave like lists.


2) Turn methods that mainly report on topology properties into managed attributes.

For example:

ag.resids()    --->   ag.resids

In code:
@property
def resids(self):
....

I suggest the following methods to be converted to managed attributes:


    bfactors  (already attribute but scheduled to be a method in 0.8)
    charges
    indices
    masses
    names
    radii
    resids
    resnames
    resnums
    segids
    
    velocities
    coordinates     # could leave as a method because we already have ag.positions — or make a clean break and also change

This change is motivated by a mail exhange we had a while back, 'Discussion to make properties and methods consistent' https://groups.google.com/d/msg/mdnalysis-devel/HQowGZqTOrQ/cVfLmS9Fjz4J . One reason why this discussion stalled was because we didn't have a list of proposed changes.

Turning the above methods into (managed) attributes would have the following 

Advantages:

1) Make the interface more consistent and probably easier to learn for new comers. 

2) Easier to write concise code, especially once these attributes return numpy arrays: instead of ugly ag.resids()[2:10:2] you write straightforward ag.resids[2:10:2].  

We could later add the ability to set these managed attributes via property.setter and then we could phase out set_xxxx() methods, thus decreasing the amount of methods/attributes exposed by AtomGroup and so tighten the API.

Disadvantage:

1) Will break existing scripts. Pretty much all of them, at least looking at mine, which use ag.resids() and friends liberally.

(Could be mitigated by a simple script that changes occurences of method calls to attribute access.)


I think we can implement (1) without problems. 

For changes as big as (2) we need to do at least a new minor release (in line with our release policy https://code.google.com/p/mdanalysis/wiki/PreparingReleases#Release_policy_and_release_numbering ). Because we're currently working towards 0.8 I am proposing these changes now.

- What do you think of these changes?
- Would you modify these changes? Add other methods, remove some?
- What would "your users" think of these changes?

Thanks for your time!
Oliver


--
Oliver Beckstein * orbe...@gmx.net
skype: orbeckst  * orbe...@gmail.com

Oliver Beckstein

unread,
Aug 1, 2013, 3:10:01 AM8/1/13
to mdnalys...@googlegroups.com
Hi,

after not having heard anything to the contrary I implemented the proposed item (1): everything that you get from AtomGroup is now either a numpy.ndarray or some MDAnalysis object such as a ResidueGroup (see cd9a044d4e49).

If no-one has any objections I'll send the proposal for (2) to the user list in the next few days to hear what people there have to say.

Cheers,
Oliver

Richard Gowers

unread,
Apr 17, 2015, 7:15:57 AM4/17/15
to mdnalys...@googlegroups.com
Hey Oliver,

I was thinking about bringing this up myself, then found this.

Moving everything on the above list (and anything new that's appeared since the list was created) is a good idea.

I think the line needs to be, if something does a calculation (centerOfMass) then it's a method, if it's just fetching data into a np array then it's just a property?  .. centerOfMass() is a bad example as it has kwargs though.

I also think all the set_resid etc should be replaced by property setters. So

ag.resids  # returns a np array of ids

ag
.resids = 1
ag
.resids = [1, 2, 3, 4, 5]



Re: breaking code, the whole point of being 0.something is we're allowed to change things, but it would make sense to try and get all these API changes in a single release.  (Ie. lets make 0.11 the huge API overhaul release?)

So if we're going to break API in places, it might be worth discussing camelCase vs. using_underscores.  I think there's equal amounts of both throughout the library, but it really needs to be one or the other.  I don't really have too strong an opinion on which, but maybe single_underscore reads better?

Richard

Oliver Beckstein

unread,
Apr 17, 2015, 10:24:10 AM4/17/15
to mdnalys...@googlegroups.com
Hi Richard,

You raise lots of good points. Can you open an issue and compile the proposed changes in this issue. (And assign to yourself.)

I like underscores better than camels (the latter bit me once... Literally, it's not a hidden snide remark at Perl ;-)). 

I also think that now is a good time to discuss what to break if something needs to be broken. 

Oliver

--
Oliver Beckstein
skype: orbeckst

--
You received this message because you are subscribed to the Google Groups "MDnalysis-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-dev...@googlegroups.com.
To post to this group, send email to mdnalys...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-devel/7f4d56a3-aa43-48ca-a8f7-59346e10fe36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Manuel Nuno Melo

unread,
Apr 17, 2015, 1:03:40 PM4/17/15
to mdnalys...@googlegroups.com
A small remark, since you also brought up Oliver's idea of a resname numpy array:
Different formats may have longer resnames. How will we deal with it? Set the array to the largest format resname length? What about when a newer format with longer length is added?

Manel

Richard Gowers

unread,
Apr 19, 2015, 7:06:38 AM4/19/15
to mdnalys...@googlegroups.com
We're not forcing a type on the resnames, numpy automatically sets the array to have the same size as the largest string.  Then stuff like GROWriter just truncates if necessary.

    def resnames(self):
        """Returns a list of residue names.

        .. versionchanged:: 0.8
           Returns a :class:`numpy.ndarray`
        """
        return numpy.array([r.name for r in self.residues])


In [1]: import numpy as np

In [2]: names = ['aaa', 'vvvv', 'gfkhaskhqegkehge']

In [3]: a = np.array([n for n in names])

In [4]: a
Out[4]:
array(['aaa', 'vvvv', 'gfkhaskhqegkehge'],
      dtype='|S16')
Reply all
Reply to author
Forward
0 new messages