Script building gmxMMPBSA in CDAC NVidia A100 GPU based HPC

41 views
Skip to first unread message

Sarthak Trivedi

unread,
Jun 19, 2025, 1:22:51 PMJun 19
to gmx_MMPBSA
Dear Sir,

I have installed gmxMMPBSA on CDAC HPC based on the configuration. Configuration of HPC is as shown below:

Model name:             AMD EPYC 7742 64-Core Processor

    CPU family:           23

    Model:                49

    Thread(s) per core:   2

    Core(s) per socket:   64

    Socket(s):            2

    Stepping:             0


GPU: NVIDIA A100-SXM4-40GB


I have installed gmxMMPBSA as per instruction. But i unfortunately i am not able to run any calculations. I am attaching my script files ( i run gmx_run.sh as a master script, which executes gmx.sh script for the calculations). Please help me to build a proper script to execute my calculations and if possible to help me to install the software and execute the calculations. I am ready to provide assistance to successfully install the software. 



gmx_run.sh
error_test.321939.err
gmx.sh

mariosergi...@gmail.com

unread,
Jun 22, 2025, 11:51:04 AMJun 22
to gmx_MMPBSA
There are certainly configurations here that I'm unaware of.
However, I can identify some potential causes that may be contributing to the problem.

According to a search, the error you get typically occurs when a node is unavailable on the network (more on this here, https://github.com/open-mpi/ompi/issues/6618).
- First, you must verify that the node you are trying to run the script on has 32 cores. If you submit it to the queue system, then you must verify that the resources are assigned correctly.
- This script should never be run on the master.
- Check that the communication interface is correct and working properly.
A few observations.
- The way you run gmx_MMPBSA with mpirun is incorrect. Here, you are running the gmx.sh script 32 times, meaning you are executing the same command all 32 times. This causes you to overwrite all the files multiple times, which can cause some files to fail and end up killing the entire process. What you want to do is run:
mpirun -mca pml ucx -x UCX_NET_DEVICES -np 32 gmx_MMPBSA -O -i mmpbsa.in -cs md_0_10.tpr -ct md_0_10_center.xtc -ci index.ndx -cg 1 13 -cp topol.top -o FINAL_RESULTS_MMPBSA.dat -eo FINAL_RESULTS_MMPBSA.csv -do FINAL_DECOMP_MMPBSA.dat -deo FINAL_DECOMP_MMPBSA.csv
This way, gmx_MMPBSA is running with all 32 CPUs.
- gmx_MMPBSA does not use a GPU, so I would suggest removing it from the configuration. Likewise, plumed isn't required, so I suggest removing it. While theoretically, it shouldn't affect anything, it doesn't contribute anything either, so it's best to rule out potential sources of error.

Let me know if these work for you.
Mario S.

Sarthak Trivedi

unread,
Jul 5, 2025, 1:19:23 PMJul 5
to gmx_MMPBSA
Hello Sir, 

I have tried a lot of configurations and modify the changes. Would you please try to help me to prepare a proper script file? I can provide you further details about the HPC and more details. 

It would be a great help, if you consider my request to prepare a proper script for the gmxMMPBSA calculation. 

PATH OF GMXMMPBSA INSTALLED is as follows.

/nlsasfs/home/groupiiiv/sarthakt/softwares/gmx_MMPBSA/miniconda3/envs/gmxMMPBSA/bin


Thank you,


Regards,

Sarthak


mariosergi...@gmail.com

unread,
Jul 5, 2025, 3:38:44 PMJul 5
to gmx_MMPBSA
It's important to be clear about your HPC's configurations. While this may work, it doesn't necessarily match the configuration required by your HPC. For example, in many cases, they require loading specific modules or using certain configurations, such as network interfaces etc. If your HPC doesn't have documentation, ask an administrator for support.

**************************************************************************************************************************************
#!/bin/bash
#SBATCH -N 1                                # number of nodes
#SBATCH --ntasks-per-node=32                # number of tasks per-node
### #SBATCH --gres=gpu:A100-SXM4:1          # not required by gmx_MMPBSA
#SBATCH --time=7-00:00:00                   # changed from 10min to 7 days
#SBATCH --partition=testp
#SBATCH --error=error_test.%J.err
#SBATCH --output=output_test.%J.out

echo "Starting at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running $SLURM_NTASKS tasks."
echo "Job id is $SLURM_JOBID"
echo "Job submission directory is : $SLURM_SUBMIT_DIR"
cd $SLURM_SUBMIT_DIR


#echo "DEBUG: Current directory is $(pwd)"
#ls -lh
# export UCX_NET_DEVICES=mlx5_0:1                                 # it's optional and it's related to the network communication
                                                                  # If you have problems with OpenMPI, disable this and use the Ethernet
### export UCX_TLS=rc,cuda_copy,cuda_ipc,self,sm
# export OMPI_MCA_pml=ucx
# export OMPI_MCA_btl=^openib
# export GMX_ENABLE_DIRECT_GPU_COMM=1
# export GMX_CUDA_GRAPH=1
# source /opt/hpcx-v2.17.1-gcc-mlnx_ofed-ubuntu22.04-cuda12-x86_64/hpcx-init.sh
# hpcx_load
# source /opt/cuda-12.4/env.sh

# source /nlsasfs/home/groupiiiv/sarthakt/Softwares/plumed-2.9.2-installation/sourceme.sh

source /nlsasfs/home/groupiiiv/sarthakt/Softwares/gromacs-2023.2-installation/bin/GMXRC
source /nlsasfs/home/groupiiiv/sarthakt/softwares/gmx_MMPBSA/miniconda3/etc/profile.d/conda.sh
conda activate gmxMMPBSA
#############################################################################################

# ---- this is not required because you activated the gmxMMPBSA environment ---- #

#GMX_DIR=/nlsasfs/home/groupiiiv/sarthakt/Softwares/gromacs-2023.2-installation/bin/
#MPI_DIR=/opt/hpcx-v2.17.1-gcc-mlnx_ofed-ubuntu22.04-cuda12-x86_64/ompi/bin
# GMXMMPBSA_DIR=/nlsasfs/home/groupiiiv/sarthakt/softwares/gmx_MMPBSA/miniconda3/envs/gmxMMPBSA/bin


# $SLURM_NTASKS is the number of global tasks (nodes * ntasks-per-node), so if you configure 2 nodes, it will automatically change to 64

mpirun -np $SLURM_NTASKS gmx_MMPBSA -O -i mmpbsa.in -cs md_0_10.tpr -ct md_0_10_center.xtc -ci index.ndx -cg 1 13 -cp topol.top -o FINAL_RESULTS_MMPBSA.dat -eo FINAL_RESULTS_MMPBSA.csv -do FINAL_DECOMP_MMPBSA.dat -deo FINAL_DECOMP_MMPBSA.csv -nogui
***********************************************************************************************************************************
Please verify that it works and let me know if you encounter any errors.
Reply all
Reply to author
Forward
0 new messages