On our linux cluster, I run combined Open MPI and OpenMP jobs using a PBS script like this:
#!/bin/bash
#PBS -N job_name(MPI_IB)
#PBS -e /home4/mcgratta/job_name.err
#PBS -o /home4/mcgratta/job_name.log
#PBS -l nodes=4:ppn=12
#PBS -l walltime=999:0:0
export OMP_NUM_THREADS=6
mpirun --report-bindings --bind-to core --map-by socket:PE=6 -np 8 fds job_name.fds
Our cluster has two sockets per node, and 6 cores per socket. The mpirun command options direct the scheduler to assign 1 MPI process to each socket, and this process then utilizes the 6 cores associated with that socket. For OpenMP to work well, there must be dedicated cores for it, and you don't want those cores to be spread across multiple processors/sockets and you don't want those cores to be dedicated to anything else.
You should see if there is a way to do this on your HPC cluster. The tricky part is that you need to know how your computers or nodes are configured. It is very confusing because the computer vendors sometimes inflate the number of "processing units" to include things like hyper-threading.
But if you get MPI running well, you will want to use that most of the time. I only use combined MPI and OpenMP when our cluster is empty and there is an abundance of free cores.