Unstable gmx_MMPBSA Runs at High MPI Ranks: "avgcomplex.pdb: No frames read" Error

debevc11

unread,

Apr 16, 2025, 10:16:10 AMApr 16

to gmx_MMPBSA

Hi everyone,

I'm running gmx_MMPBSA (version 2023.3) with quasi-harmonic entropy calculations on 1000 ns MD trajectories generated with GROMACS 2023.1. I'm working on two different but similar systems ("51mod" and "negative"), and both runs frequently fail when using high MPI parallelization (-np 64).

The crash happens late in the run (~90% progress) with the following error:

Error: PDB _GMXMMPBSA_avgcomplex.pdb: No frames read. atom=1619 expected 4932. Error: Could not set up '_GMXMMPBSA_avgcomplex.pdb' for reading. cpptraj failed with prmtop COM.prmtop! Error occurred on rank 34.

Here’s the SLURM submission script I’m using:

#!/bin/bash #SBATCH --job-name=51mod #SBATCH --partition=all #SBATCH --nodes=8 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=8 #SBATCH --mem=196GB #SBATCH --output=MMPBSAENTROPY.out #SBATCH --time=2-00:00:00 export OMP_NUM_THREADS="${SLURM_CPUS_PER_TASK:-1}" export GMXMMPBSA_TEMP_DIR=$SLURM_TMPDIR module load GCC/12.2.0 module load Anaconda3/2023.07-2 source activate gmxMMPBSA2023.3 mpirun -np 64 gmx_MMPBSA -O -i mmpbsa_initial.in -cs md51mod1000.tpr -ct mol51mod1000.xtc -ci index.ndx -cg 10 11 -cp 51mod.top -o FINRES_MMPBSAENT.dat -eo FINRES_MMPBSAENT.csv conda deactivate

I've seen this error on both systems multiple times. It just happens that one of the 51mod runs managed to finish correctly once, likely by chance.

Lowering -np to 32 always works, but unfortunately it’s not fast enough to finish a 1000 ns simulation within my SLURM time limit (2 days). I suspect it may be related to temporary file I/O or a race condition when writing/reading _avgcomplex.pdb.

Any advice would be greatly appreciated! I’m happy to provide more logs or test suggestions.

Thanks in advance

MMPBSAENTROPY.out

gmx_MMPBSA.log

marioe911116

unread,

Apr 16, 2025, 10:49:03 AMApr 16

to gmx_MMPBSA

You are performing the calculation for 79,001 frames, which is a lot, and you risk having "non-independent" snapshots. Snapshots should be taken after 1 - 10 ps to be statistically independent (https://pmc.ncbi.nlm.nih.gov/articles/PMC4487606/#S0002:~:text=snapshots%20taken%20after%201%20%E2%80%93%2010%20ps%20are%20statistically%20independent)

try with fewer frames setting interval to 10 instead of 1 and see how it goes...

mariosergi...@gmail.com

unread,

Apr 16, 2025, 10:49:15 AMApr 16

to gmx_MMPBSA

Thanks for reporting this. QH has fallen somewhat into the shadows (we're even considering removing support for it) because it's computationally expensive, requires specific run conditions, and often cannot be tested and achieve good results.
Is it useful in your case, or are you just exploring methods? Could you explain your situation further so we can help you? We don't have much time, but we'll try to resolve the issue as soon as possible.

debevc11

unread,

Apr 16, 2025, 11:48:36 AMApr 16

to gmx_MMPBSA

Thanks for both replies — really appreciate the insight.

I’m using quasi-harmonic (QH) entropy to complement MMPBSA binding energy comparisons between several nanobody–antigen complexes. I initially thought one of the mutated models (51mod) was performing better due to its significantly lower enthalpy (ΔH) compared to the wildtype. However, after including QH entropy, I noticed that 51mod had a much higher entropy penalty (–TΔS), which significantly impacted the overall ΔG.

These calculations were part of a systematic analysis based on simulation length. In each case, the first 1,000 frames were excluded from the entropy calculation to avoid initial equilibration bias:

200ns = 19.000 frames

400 ns = 39.000 frames

600 ns = 59.000 frames

800 ns = 79.000 frames

1000 ns = 99.000 frames

Here's a summary of ΔH, –TΔS, and ΔG for 51mod, wildtype, and a negative control:

We also have experimental binding affinity data (SPR) for the wildtype, so the goal was to use this data as a benchmark and see whether the simulations converge toward those values over time.

I now see that using ~79,000 or more frames might be overkill and potentially introduces statistical correlation between frames. I’ll try rerunning with interval = 10 to reduce the frame count and improve snapshot independence, as suggested.

I understand QH has some limitations, but in this context, it has been quite helpful in uncovering a misleading interpretation based solely on enthalpy.

Thanks again for your time and help — it’s much appreciated.

sreda, 16. april 2025 ob 16:49:15 UTC+2 je oseba mariosergi...@gmail.com napisala:

Reply all

Reply to author

Forward