Plumed gromacs correuption error

Jonathan Shearer

unread,

Jul 8, 2019, 5:59:31 AM7/8/19

to PLUMED users

Hi all,

Currently doing some Hamiltonian replica exchange for a membrane system (where one leaflet of the membrane was tempered) and this had been working fine. Upon adding a membrane protein to this system I now consistently get this error

Error in `gmx_mpi': double free or corruption

and then the job crashes. I have tried to just run this job on another cluster in case this was a hardware issue, but got the same error again. My set up for the first cluster was: gcc 4.9.2, intel 2018 for the mpi compilers and the second cluster: gcc 6.4.0 and then again intel 2018 compilers. In both cases I think gcc was only used for things like the fftw3 compilation and the intel compilers used for the main bulk of the plumed and gromacs installation. I have visually checked the replicas and plotted out things like the potential energy, but there doesn't seem to be anything suspicious going on in the run up to job failure. This is a coarse grained simulation (martini v2.2) so the abridged simulation parameters were: stochastic v-rescale thermostat (1.0 coupling constant), parrinello-rahman barostat (semi-isotropic, 12.0 coupling constant), timestep 10 fs, starting temp fo 323 K and a maximum lambda of 0.646. Note that I did try using a smaller timestep (5 fs), but the same error popped up.

I'm not sure if this is an issue anyone can help me with or if I just need to trawl through different compiler combinations, but any suggestions would be appreciated.

Best wishes,

Jon.

Gareth Tribello

unread,

Jul 8, 2019, 6:03:55 AM7/8/19

to plumed...@googlegroups.com

Hello

Can you run simulations if you just use gromacs with no plumed? That would help as then we would at least be able to work out if the problem was with PLUMED or gromacs.

Thanks

Gareth

--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users...@googlegroups.com.
To post to this group, send email to plumed...@googlegroups.com.
Visit this group at https://groups.google.com/group/plumed-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/plumed-users/77199984-a273-432f-a6e1-846107d6a723%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Giovanni Bussi

unread,

Jul 8, 2019, 9:02:42 AM7/8/19

to plumed...@googlegroups.com

Hi!

I've never seen this, but I think other people reported it in the past. Something that I've seen is some error in allocation after very long simulations (easily solved running it in pieces, as we normally do on our cluster).

The fact that the problem appears when you add the membrane makes me think that it might be related to the total number of atoms. Is it possible?

Giovanni

To view this discussion on the web visit https://groups.google.com/d/msgid/plumed-users/37C68AC3-97CD-428D-B814-B238F0F98C8B%40gmail.com.

Jonathan Shearer

unread,

Jul 8, 2019, 9:51:39 AM7/8/19

to PLUMED users

Hi,

To reply to both of your comments

1) Just checked running the same simulation with the patched gromacs version (but with no replica exchange) and it is running past where failures usually happen. I had also previously equilibriated this system (for a few microseconds) with an unpatched version of gromacs.

2) The system that works has 10420 beads, while the system that fails has 14925 beads, so neither is that large from a CG perspective. The failure seems to occur between 1-5 ns into the simulation (usually within the first hour of runtime), so restarting every few ns would possible, but not ideal (and a bit worrying).

Thank you for your help!

Best wishes,

Jon.

On Monday, July 8, 2019 at 2:02:42 PM UTC+1, Giovanni Bussi wrote:

Hi!

I've never seen this, but I think other people reported it in the past. Something that I've seen is some error in allocation after very long simulations (easily solved running it in pieces, as we normally do on our cluster).

The fact that the problem appears when you add the membrane makes me think that it might be related to the total number of atoms. Is it possible?

Giovanni

On Mon, Jul 8, 2019 at 4:03 AM Gareth Tribello <gareth....@gmail.com> wrote:

Hello

Can you run simulations if you just use gromacs with no plumed? That would help as then we would at least be able to work out if the problem was with PLUMED or gromacs.

Thanks

Gareth

On 8 Jul 2019, at 10:59, Jonathan Shearer <js1...@my.bristol.ac.uk> wrote:

Hi all,
Currently doing some Hamiltonian replica exchange for a membrane system (where one leaflet of the membrane was tempered) and this had been working fine. Upon adding a membrane protein to this system I now consistently get this error

Error in `gmx_mpi': double free or corruption

and then the job crashes. I have tried to just run this job on another cluster in case this was a hardware issue, but got the same error again. My set up for the first cluster was: gcc 4.9.2, intel 2018 for the mpi compilers and the second cluster: gcc 6.4.0 and then again intel 2018 compilers. In both cases I think gcc was only used for things like the fftw3 compilation and the intel compilers used for the main bulk of the plumed and gromacs installation. I have visually checked the replicas and plotted out things like the potential energy, but there doesn't seem to be anything suspicious going on in the run up to job failure. This is a coarse grained simulation (martini v2.2) so the abridged simulation parameters were: stochastic v-rescale thermostat (1.0 coupling constant), parrinello-rahman barostat (semi-isotropic, 12.0 coupling constant), timestep 10 fs, starting temp fo 323 K and a maximum lambda of 0.646. Note that I did try using a smaller timestep (5 fs), but the same error popped up.

I'm not sure if this is an issue anyone can help me with or if I just need to trawl through different compiler combinations, but any suggestions would be appreciated.

Best wishes,
Jon.

--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to plumed...@googlegroups.com.

To post to this group, send email to plumed...@googlegroups.com.
Visit this group at https://groups.google.com/group/plumed-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/plumed-users/77199984-a273-432f-a6e1-846107d6a723%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to plumed...@googlegroups.com.

Jonathan Shearer

unread,

Jul 9, 2019, 5:53:49 AM7/9/19

to PLUMED users

Hi all,

So I asked around my fellow plumed users in the department and someone had a fix which seems to work for me aswell. The issue was resolved by setting dynamic load balancing off when calling mdrun.

Best wishes,

Jon.

Giovanni Bussi

unread,

Jul 9, 2019, 6:54:47 AM7/9/19

to plumed...@googlegroups.com

That's actually very useful feedback thanks a lot!!

I reopened an old issue that I suspect was related to this:

https://github.com/plumed/plumed2/issues/410

To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users...@googlegroups.com.

To post to this group, send email to plumed...@googlegroups.com.
Visit this group at https://groups.google.com/group/plumed-users.

To view this discussion on the web visit https://groups.google.com/d/msgid/plumed-users/18867086-b9cc-488b-bdac-f039a4a71d96%40googlegroups.com.

Reply all

Reply to author

Forward