Segmentation fault (11) with plumed/2.0-hrex

944 views
Skip to first unread message

binette...@gmail.com

unread,
Jun 26, 2014, 2:45:54 PM6/26/14
to plumed...@googlegroups.com
Dear PLUMED users and developers,

I am facing a segmentation fault problem with GROMACS 4.6.5 patched with plumed2.0-hrex and I was wondering if you could help me.

My HREX simulations were performed with 16 replicas. I equilibrated each replica independently with the same simulation parameters as for my production run. I have attached to this post my rather simple PLUMED file (only 2 walls) and my mdp file. 

No problem occurred while equilibrating each replica independently for 5 ns. However, my HREX simulation crashed after a few picoseconds only with the following error:

[cp2131:07989] Signal: Segmentation fault (11)
[cp2131:07989] Signal code: Address not mapped (1)
[cp2131:07989] Failing at address: 0xe2304598
[cp2131:07989] [ 0] /lib64/libpthread.so.0() [0x3da2e0f710]
[cp2131:07989] [ 1] /opt/openmpi/1.4.3/intel/lib/libopen-pal.so.0(opal_memory_ptmalloc2_int_malloc+0x775) [0x2af2e20d2eb5]
[cp2131:07989] [ 2] /opt/openmpi/1.4.3/intel/lib/libopen-pal.so.0(+0x48845) [0x2af2e20d1845]
[cp2131:07989] [ 3] /usr/lib64/libstdc++.so.6(_Znwm+0x1d) [0x36b22bd09d]
[cp2131:07989] [ 4] mdrun_plumed(_ZN4PLMD5Atoms8shareAllEv+0x11e) [0x52b42e]
[cp2131:07989] [ 5] mdrun_plumed(_ZN4PLMD4GREX13savePositionsEv+0x25) [0x5468c5]
[cp2131:07989] [ 6] mdrun_plumed(_ZN4PLMD4GREX3cmdERKSsPv+0x305) [0x544995]
[cp2131:07989] [ 7] mdrun_plumed(_ZN4PLMD10PlumedMain3cmdERKSsPv+0x2fba) [0x54c60a]
[cp2131:07989] [ 8] mdrun_plumed(plumedmain_cmd+0x44) [0x555564]
[cp2131:07989] [ 9] mdrun_plumed(plumed_cmd+0x21) [0x6545e1]
[cp2131:07989] [10] mdrun_plumed(do_md+0x70b3) [0x4802b3]
[cp2131:07989] [11] mdrun_plumed(mdrunner+0x1a65) [0x46b915]
[cp2131:07989] [12] mdrun_plumed(cmain+0xf73) [0x481683]
[cp2131:07989] [13] mdrun_plumed(main+0x49) [0x487199]
[cp2131:07989] [14] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3da2a1ed1d]
[cp2131:07989] [15] mdrun_plumed() [0x465b69]
[cp2131:07989] *** End of error message ***

I am using the following hardware and compilers: 

Host: cp2132  pid: 5821  nodeid: 0  nnodes:  48
Gromacs version:    VERSION 4.6.5
Precision:          single
Memory model:       64 bit
MPI library:        MPI
OpenMP support:     enabled
GPU support:        disabled
invsqrt routine:    gmx_software_invsqrt(x)
CPU acceleration:   SSE2
FFT library:        MKL
Large file support: enabled
RDTSCP usage:       enabled
Built on:           Wed May 14 16:24:52 EDT 2014
Built by:           hlu@ip07 [CMAKE]
Build OS/arch:      Linux 2.6.32-431.11.2.el6.x86_64 x86_64
Build CPU vendor:   AuthenticAMD
Build CPU brand:    AMD Opteron(tm) Processor 6172
Build CPU family:   16   Model: 9   Stepping: 1
Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx msr nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
C compiler:         /opt/openmpi/1.4.3/intel/bin/mpicc Intel icc (ICC) 12.0.5 20110719
C compiler flags:   -msse2    -mkl=sequential -std=gnu99 -Wall   -ip -funroll-all-loops  -O3 -DNDEBUG
Linked with Intel MKL version 10.3.5.

There isn't any other error message in the log files. Each trajectory seems to be fine prior to the crash. 

I have used GROMACS and PLUMED before, but this is the first time I have seen this error. Would you have any idea of the origin of the problem?

Best regards,

Vincent Binette





MDrun.mdp
PLUMED_Replex_alpha_distz.dat

Giovanni Bussi

unread,
Jun 27, 2014, 4:05:42 AM6/27/14
to plumed...@googlegroups.com
Hi Vincent,


good that it crashes after a few ps only. This is making it easy to reproduce.

Mdp seems fine.

Plumed input could have some problems with PBC. You should not compute COM of atoms that are not part of the same molecule and that could jump across PBC. Typically in setups with membranes it is ok to use wholemolecules also on the membrane (P only in your case). membrane will be whole in z direction and possibly broken in xy direction, but dist.z should not be affected. I am not sure this will fix your problem but you should fix it anyway.

Also notice that dist.z has not the proper periodicity with this setup. In the manual of DISTANCE you find some more explanation on this and a possible workaround. This is probably not a problem for you since you add walls on dist.z.

Concerning the hrex, you can try to use exactly the same initial configurations but with identical Hamiltonians. Does the problem persist? In case so, you can even switch back to standard gromacs patch (no hrex). Notice that the hrex patch is not super robust (that is why I am not merging it into the official plumed release) and there could be a problem specific to that patch. In case so, I can try to fix it.

Another question: did you use the walls during equilibration? If not, the problem is likely due to their sudden switch on or possibly to the issue with PBC. If yes, so possibly the problem is with the hrex patch.

Bottom line: try to decouple the problem and understand if it is with your plumed input, with a bug in plumed, in gromacs or in the hrex patch.

Many thanks for your feedbacks!

Giovanni




--
You received this message because you are subscribed to the Google Groups "PLUMED users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plumed-users...@googlegroups.com.
To post to this group, send email to plumed...@googlegroups.com.
Visit this group at http://groups.google.com/group/plumed-users.
For more options, visit https://groups.google.com/d/optout.

binette...@gmail.com

unread,
Jul 2, 2014, 4:49:20 PM7/2/14
to plumed...@googlegroups.com
Dear sir,

Thank you for your prompt reply!

Good new, I made a new simulation with the WHOLEMOLECULES flag on my membrane and the segmentation fault error is gone. Unfortunately, when I use the WHOLEMOLECULES flag on my membrane's phosphorus only, I am facing some weird results; all of a sudden, the distance computed by PLUMED is incorrect. Here is an example of my PLUMED outputs:

292.000000 3.351356 1.204598 0.000000 0.000000
294.000000 3.609231 1.186091 0.000000 0.000000
296.000000 4.679452 1.244434 0.000000 0.000000
298.000000 4.985812 -3.084438 0.000000 16.682637
300.000000 2.761873 1.166858 0.000000 0.000000
302.000000 4.058672 1.179533 0.000000 0.000000

This problem does not occur without the WHOLEMOLECULES flag or with the WHOLEMOLECULES flag on my entire membrane.
Am I misusing wholemolecules? I have attached to this post my modified PLUMED file.

Best regards,

Vincent Binette
PLUMED_Replex_alpha_distz.dat

Giovanni Bussi

unread,
Jul 3, 2014, 2:00:08 AM7/3/14
to plumed...@googlegroups.com
Hi,

a possible explanation is that a couple of consecutive P atoms are too far in z direction.

Is it possible that the z distance between two P atoms is larger than half the box size in z direction?

To verify if this is the case, and if the problem is reproducible, you can use DUMPATOMS to dump on a gro file the COM of the P atoms (see the example in DUMPATOMS manual). X and y components will be irrelevant, but z component should not have jumps. If it has a jump corresponding to the jump in dist.z, then this is the problem.

The real fix for this is to implement COM with trigonometric functions, which is on the todo list (github.com/plumed/plumed2/issues/66).

A simpler solution is to include in the entity1 of the wholemolecules command also some C from the tails, which are certainly closer. A safe choice could be to replace each P atom in the list with a triplet of numbers, namely the C from the tail, the P itself, and the C again (yes, you can repeat a number). E.g., if C300 is from the same lipid as P296 and C352 is from the same lipid as P348:
ENTITY1=300,296,300,352,348,352,....
Try with a C that is down in the chain, as close as possible to the actual center of the membrane.

Let us know if this fixes your problem!

Giovanni
Message has been deleted
Message has been deleted

binette...@gmail.com

unread,
Jul 15, 2014, 5:11:39 PM7/15/14
to plumed...@googlegroups.com
Dear Sir, 

Again, thank you for your quick answer and for your help!

Good news! Just as you said, the problem with the peptide/membrane distance is solved when I add some C to reconstruct my membrane and, for the moment, I have no more segmentation fault. Unfortunately, I am still facing a problem.

While I was investigating the segmentation fault error, I made, as you suggested me, a simulation with identical Hamiltonians with the hrex flag. All replicas are starting with the same initial configuration and with the same velocity. After a 10 ns simulation, the exchange rates between some replicas are surprisingly low (around 8%).

Repl  average number of exchanges:
Repl     0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
Repl      .08  1.0  .39  .20  .07  .48  .29  .07  .57  .07  .49  .26  .49  1.0  1.0

Is it possible to have such exchange rates with identical Hamiltonians? Would you have any idea on what might be the problem?

Best regards,

Vincent Binette

Giovanni Bussi

unread,
Jul 17, 2014, 6:32:51 AM7/17/14
to plumed...@googlegroups.com
Hi,

since you use the same plumed.dat on all replicas and only have pre-defined bias potentials, acceptance should be exactly one if Hamiltonians are identical. So, this sounds like a problem...

Can you check what gromacs writes in the log upon exchange?

Additionally, a run with identical Hamiltonian but different initial structure could help. It is enough to run for 10 ps, the acceptance should be *exactly* 100% at every single exchange (99% is not tolerated).

An additional problem that I found sometime is that acceptance is not computed properly on steps not compatible with nstlist, which is usually reset to 40 by gromacs when running on GPUs. Are you running on GPUs? In case so, remember to use a replex stride which is a multiple of 40.

Finally, I always used this with "cutoff-scheme=verlet". Could you check this additionally?

Let me know if you manage to find the problem.

Thanks for you feedbacks!

Giovanni

binette...@gmail.com

unread,
Jul 20, 2014, 7:09:12 PM7/20/14
to plumed...@googlegroups.com

Dear sir,


I made 4 new simulations to understand the problem. The first one is a 100 ps simulation with identical Hamiltonians and without my PLUMED file but this time, I changed the cutoff-scheme to verlet (instead of group) and I placed nstlist at 1. The acceptance of this simulation is 100% at every single exchange.


The second simulation is the same as the first one but with cutoff-scheme = group with nstlist = 1. This time, the acceptance is not always 100%.


In the third one, I remade one of my old simulations (scaling from 1 to 0.3 after a NPT equilibration of each replica independently) but with cutoff-scheme = verlet and nstlist = 1 and without my PLUMED file. The exchange rates are fine.


Then, I added my PLUMED file and my simulation crashed after 900 steps with the same segmentation fault error as before. This error does not occur when without the -hrex option.


I would like to know, when the cutoff-scheme = verlet, does mdrun adjust nstlist when it is not set up to 1? It is really ambiguous in the GROMACS manual (http://manual.gromacs.org/online/mdp_opt.html); if you read the nstlist description it says than mdrun adjusts nstlist but the contrary is written in the verlet-buffer-tolerance description.


What are your thoughts on all this?


Again thank you for your advice!


Best regards,


Vincent Binette


Reply all
Reply to author
Forward
0 new messages