Optimization/refactoring of compound handling and caching.

Manuel N. Melo

unread,

Feb 3, 2021, 2:35:19 PM2/3/21

to mdnalys...@googlegroups.com

Hello devs,

Last October I opened a number of issues and PRs on the topics of compound handling and caching (see #3000 and #3005). Those arose from a need to implement gmx trjconv -pbc mol/nojump (#2982) and to ultimately do away with the need for gmx trjconv use --- the main roadblock being our extremely slow unwrap() implementation (#2376).

In #3000 I posted some promising benchmarks on some compound refactoring that I follow up in #3005. In #2376 I put forth a couple of ideas for unwrap() that I have also confirmed can significantly speedup execution, making it usable for full-frame unwrapping.

Perhaps the most ambitious and impactful of these ideas is to centralize compound caching under the Universe object, not on the AtomGroup itself. This allows any topology-changing methods (such as residue renumbering or bond assignment) to centrally invalidate those caches.

I come to ask for feedback on these topics, since the conversations in the issues/PRs didn't go far. I'd like to move this forward but know that it's going to be a large effort, and would rather not do it without some idea of where you stand on it.

Cheers!

Manel

Oliver Beckstein

unread,

Feb 6, 2021, 3:25:32 AM2/6/21

to mdnalysis-devel

Hi Manel,

As I said in various comments on the issues: I’d love to see performance improvements to the PBC handling code, in particular being able to have full unwrap/make whole/compact functionality at a speed that is not prohibitively slow.

It is hard for me (in the time I have to look at code that I didn’t write) to assess the impact of the changes. I also don’t know what approach Johannes looked at in his taster_unwrapping branch. I am sorry I have not been able to be more active on reviewing these PRs.

So from my position of relative ignorance I’ll say that I can immediately support anything that (1) does not break tests, (2) does not break the API (although this is a point for discussion for 2.0), (3) “just” includes refactoring or performance improvements. Which of your current solutions/PRs fall in this category?

Once we get into changing fundamental things such as adding a cacheing system to Universe I’d want to know the pros and cons. Could you expand more on what this would entail? Which of your solutions are outside the first category? What do you minimally need to implement?

On a more general note: Irfan just recently mentioned again the idea of developer talks. Perhaps you could do a ~15 min presentation/demo on Zoom or Discord and then anyone who can join can give feedback there?

Furthermore, if you want to go forward, we could apply for a NF small dev grant for the work. Maybe that would help you, too.

I hope other people voice their opinions, too.

Oliver

--

Oliver Beckstein (he/his/him)

orbe...@gmail.com

Message has been deleted

Manuel N. Melo

unread,

Feb 6, 2021, 10:12:36 PM2/6/21

to mdnalys...@googlegroups.com

Thanks for the feedback, Oliver!

On the first topics of refactoring, I will try to submit as separate as possible so that the PRs can be judged individually.

On Johannes' unwrapping efforts, I did ask in issue 2376 but there wasn't any reply. They seem to go in a direction orthogonal to my caching ideas, so they should also be considered as we may be able to get more than one type of speedup. I also put forth an idea of quickly pre-checking which molecules are broken, which already gave large gains, but I don't know if Johannes had implemented anything along those lines too.

On the topic of Universe-based caching I would love to shortly present this! I can try to summarize it here:

Problem: Fragment computation is slow.

Solution: We do it once and cache the result.

Problem (also faced by Johannes): Caches are at the AtomGroup level. If the topology changes (e.g. bonds added/deleted), how can AtomGroups be notified that those caches need to be invalidated? (Ergo, there is currently only limited caching of fragments).

Solution: We cache fragment information in a per-AtomGroup cache specific for fragments, under the Universe object. We then make all topology-modifying operations invalidate said cache.

Alternative - Cache versioning: Instead of caching under Universe, we just set at the Universe level some sort of version number of the topology, which gets updated every time the topology changes. Cache management at the AtomGroup level then keeps track of the version associated with the cache, and triggers a recalculation if a new version is found.

The idea extends to caching of any type of compound groups (Segments, Residues, etc.). In this scope, we want specific topology changes to only invalidate the respective caches (a residue reassignment will affect residue caches, but not fragment ones). Caching under Universe must then distinguish between fragment-related caches and residue-related/segment-related caches, etc. Likewise for Cache Versioning, which must keep track of independent hashes for different parts of the topology.

Finally, as to funding, so far the work seems manageable without. I'd say we keep that in mind if this turns into a deeper rabbit hole?

What do you guys think?

Cheers,

Manel

--
You received this message because you are subscribed to the Google Groups "MDnalysis-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-devel/B9F21F2C-4FB1-466D-8FAA-AAD712A01B6E%40gmail.com.

Manuel N. Melo

unread,

Feb 15, 2021, 1:40:23 AM2/15/21

to mdnalys...@googlegroups.com

Now that #3005 is merged I'll prepare a tentative PR for the caching aspect. We can discuss cache strategies from there (including the possibility of a short presentation for the devs, as suggested earlier).

Cheers,

Manel

On Sun, Feb 7, 2021 at 8:04 AM 'irfan....@googlemail.com' via MDnalysis-devel <mdnalys...@googlegroups.com> wrote:

Hi,

> I come to ask for feedback on these topics, since the conversations in the issues/PRs didn't go far.

Apologies for this. Unfortunately it's been rather difficult to keep up with reviewing lately.

It's probably worth bringing up for the non-core developers who might be reading this; if you are interested in reviewing PRs, it's something that we would very much welcome. Whilst we can't grant merge powers, if we know that another pair of eyes has had a look through the code, it significantly helps speed up the process. If it's something that you want to do but you're not sure about how to get started, please do get in touch, we would be more than willing to walk you through the process.

> So from my position of relative ignorance I’ll say that I can immediately support anything that (1) does not break tests, (2) does not break the API (although this is a point for discussion for 2.0), (3) “just” includes refactoring or performance improvements. Which of your current solutions/PRs fall in this category?

I would agree with Oliver. From a very very quick glance at #3005, it seems to fall within this category.

> On a more general note: Irfan just recently mentioned again the idea of developer talks. Perhaps you could do a ~15 min presentation/demo on Zoom or Discord and then anyone who can join can give feedback there?

The above being said, I am poorly versed in the wrap/unwrap portion of the code, a presentation would definitely go a long way here. I am still in the process of working out some of the basics of how we could have a user/developer meetings (the biggest hurdle being dealing with timezones). But if it's something you'd be willing to contribute a presentation, that would be amazing! It would also force me to get it organised ;)

P.S. Anyone wanting to help out with organising user/developer meetings, please do get in touch.

Best regards,

Irfan

--
You received this message because you are subscribed to the Google Groups "MDnalysis-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mdnalysis-dev...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/mdnalysis-devel/8b1fa1e1-48ad-4fd9-a202-f431b7245e7bn%40googlegroups.com.

Reply all

Reply to author

Forward