New topology model performance improvements

21 views
Skip to first unread message

David Dotson

unread,
Dec 21, 2015, 3:31:54 PM12/21/15
to mdnalys...@googlegroups.com
Hey all,

Richard and I spent a good deal of time during his visit to Arizona working away at issue 363, and we are almost finished with it. The result is an entirely new, array-based topology model for MDAnalysis, which gives performance improvements in both speed and memory usage, improves API consistency, and avoids problems of staleness previously encountered when working with residues and segments.

Work is still ongoing on the issue-363 branch, but to begin discussion of this major change we wanted to present one of the main points of motivation for it: performance for large systems. A smattering of performance comparisons can be viewed in this notebook.

Feel free to reply to the list with thoughts on this. We're working on a thorough written description + diagram for how the new topology system works, but hopefully it won't be long before we can push this out.

Cheers!

David

Richard Gowers

unread,
Dec 21, 2015, 5:58:41 PM12/21/15
to MDnalysis-devel
Just to add, most features in all formats should work by now, so you can install the issue-363 branch and try it out for yourselves.  With the topology system being more adaptable (we don't have to just use the standard attributes of type, name, resname etc), some things might be named differently.  If people could fire up a Universe with their favourite format and check that all the data in an AtomGroup is where they expect it, that would be great.

Max Linke

unread,
Jan 5, 2016, 5:18:45 PM1/5/16
to MDnalysis-devel
Not sure where you want to have the issues posted but selections do not work right now in the 'issue-363'

```
>>> sel = u.select_atoms('segid B and resid 1-100 and name CA')
>>> sel.n_atoms 
248
```

This is not correct. The selection should only contain 100 atoms.

Richard Gowers

unread,
Jan 5, 2016, 5:21:03 PM1/5/16
to MDnalysis-devel
Is it because it's not a set being returned?  Ie does the set contain 100 elements?

Thanks for checking this, funnily enough all the tests were passing!

Max Linke

unread,
Jan 5, 2016, 5:24:27 PM1/5/16
to MDnalysis-devel
Oh don't worry I'll try to run some more test once stuff gets fixed along.

Richard Gowers

unread,
Jan 5, 2016, 6:32:58 PM1/5/16
to MDnalysis-devel
What system is that example from?

So I **think** that all the test_atomselections tests pass, so if you want to add in some tests that fail in a PR onto -363 then I'll fix selections again :D

Max Linke

unread,
Jan 6, 2016, 3:11:48 AM1/6/16
to mdnalys...@googlegroups.com


On 01/06/2016 12:32 AM, Richard Gowers wrote:
> What system is that example from?

Mostly creating selections to use in another program.

> So I **think** that all the test_atomselections tests pass, so if you
> want to add in some tests that fail in a PR onto -363 then I'll fix
> selections again :D

Well the one I posted yesterday fails. It should also fail for other
PDBs. I'll see if I can reproduce it with our current test PDBs


So I found another issue with the new branch

>>> u = mda.Universe(<multi segment pdb>)
>>> u.segments
<SegmentGroup with 5 Segments>

This is a HUGE deviation from the old behavior where I'm shown the
segment names as well. I actually liked that a lot to see which segments
where defined in the Topology.


There is also an issue that the new Topology parser in that branch
doesn't find all the segments in a PDB I have. Since the PDB is from a
colleague I'll check if he is OK giving it to you.

Richard Gowers

unread,
Jan 6, 2016, 6:49:16 PM1/6/16
to MDnalysis-devel
Hey Max

I just pushed a lot of work on selections, so pull again and see if I've fixed things.

WRT repr strings, I think that's just David and I not having done it yet.  I do want to make the repr strings only display every resname for len < 10, (think numpy array repr).

PDBParsing, I might have made a mistake there.  I don't use PDB myself, so might have got a column wrong.

Max Linke

unread,
Jan 7, 2016, 4:55:04 PM1/7/16
to mdnalys...@googlegroups.com
I attached a PDB with which I can reproduce the problems.
The PDB contains about 19 segments which is recognized correctly from
the current version.

On 01/07/2016 12:49 AM, Richard Gowers wrote:
> Hey Max
>
> I just pushed a lot of work on selections, so pull again and see if I've
> fixed things.

Nope I still see the problem. On the attached PDB try.

select_atoms('segid B and resid 1-100')

This gives me a selection with 248 atoms.

> WRT repr strings, I think that's just David and I not having done it
> yet. I do want to make the repr strings only display every resname for
> len < 10, (think numpy array repr).

Well I would be ok with that if I could access the resname at all. There
only seem to be numbers right now.

broken.pdb
Reply all
Reply to author
Forward
0 new messages