Hi,
I am trying to run Gurobi on a cluster which contains 44 nodes with 6 CPUs on each node. The process is to call gurobi_cl for a batch script and pass in the LP file for solving. First, I am assuming Gurobi is able to be run on multiple nodes, so correct me if this is not possible.
The HPC cluster uses a SLURM system for managing jobs submitted via sbatch. In consultation with the system admins we have tried configuring the scripts to call gurobi_cl from both mpirun (JOBSCRIPT1) and openmp (JOBSCRIPT2). This appears to work and gurobi runs and solves the model, however the performance (solution time) of adding more nodes appears to be negligible (and possible degrades with more resources), so something does not seem right with how we are calling Gurobi. In the gurobi_cl we specify the number of Threads to be equal to the number of CPUs (ntask x cpus-per-task) made available (note in the Gurobi log it reports all processors of the cluster as being available 264). Can anyone help me with what I am missing in the configuration script or otherwise? I can provide a copy of the model if required.
CPUs/Threads >>> Solution Time
60 >> 58.67s
24 >> 52.37s
6 >> 49.13s
Any tips or comments are greatly appreciated.
Thanks,
Ben
### JOB SCRIPT 1 - MPI
#!/bin/bash -l
#SBATCH --output=gurobi-%j.out
#SBATCH --ntasks=10
#SBATCH --cpus-per-task=6
#SBATCH --time=00:20:00
#SBATCH --export=ALL
module load mpt
export GUROBI_HOME="/home/gurobi650/linux64"
export PATH="${PATH}:${GUROBI_HOME}/bin"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib"
mpirun -np 1 /home/gurobi650/linux64/bin/gurobi_cl Threads=60 VarBranch=1 Cuts=1 PreSolve=2 /home/bgroeneveld/160314_7/Model/TestModel1.lp.bz2
### JOB SCRIPT 2 - OpenMP
#!/bin/bash -l
#SBATCH --output=gurobi-%j.out
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=6
#SBATCH --time=00:20:00
#SBATCH --export=ALL
module load mpt
export GUROBI_HOME="/home/gurobi650/linux64"
export PATH="${PATH}:${GUROBI_HOME}/bin"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${GUROBI_HOME}/lib"
export OMP_NUM_THREADS=24
omplace -nt $OMP_NUM_THREADS /home/gurobi650/linux64/bin/gurobi_cl Threads=24 VarBranch=1 Cuts=1 PreSolve=2 ResultFile=/home/160314_7/Solutions/TestModel1.sol /home/160314_7/Model/TestModel1.lp.bz2
### EXAMPLE GUROBI OUTPUT
Set parameter Threads to value 60
Set parameter VarBranch to value 1
Set parameter Cuts to value 1
Set parameter PreSolve to value 2
Gurobi Optimizer version 6.5.1 build v6.5.1rc3 (linux64)
Copyright (c) 2016, Gurobi Optimization, Inc.
Read LP format model from file /home/160314_7/Model/TestModel_1.lp.bz2
Reading time = 0.27 seconds
(null): 3750 rows, 21507 columns, 288663 nonzeros
Optimize a model with 3750 rows, 21507 columns and 288663 nonzeros
Coefficient statistics:
Matrix range [2e-04, 1e+05]
Objective range [1e+00, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e-04, 4e+03]
Found heuristic solution: objective -0
Presolve removed 35 rows and 1429 columns
Presolve time: 1.50s
Presolved: 3715 rows, 20078 columns, 268421 nonzeros
Variable types: 19968 continuous, 110 integer (95 binary)
Presolve removed 6 rows and 0 columns
Presolved: 3709 rows, 20084 columns, 268325 nonzeros
Root simplex log...
Iteration Objective Primal Inf. Dual Inf. Time
9721 8.4023935e+02 1.063565e+00 0.000000e+00 5s
10008 8.3928246e+02 0.000000e+00 0.000000e+00 5s
10008 8.3928246e+02 0.000000e+00 0.000000e+00 5s
Root relaxation: objective 8.392825e+02, 10008 iterations, 3.70 seconds
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 839.28246 0 55 -0.00000 839.28246 - - 5s
0 0 768.68336 0 46 -0.00000 768.68336 - - 9s
0 0 765.43721 0 47 -0.00000 765.43721 - - 10s
0 0 745.45581 0 48 -0.00000 745.45581 - - 11s
0 0 740.55523 0 51 -0.00000 740.55523 - - 12s
0 0 724.45883 0 56 -0.00000 724.45883 - - 16s
H 0 0 367.2407082 724.45883 97.3% - 16s
0 0 722.02553 0 56 367.24071 722.02553 96.6% - 18s
H 0 0 387.5819473 722.02553 86.3% - 18s
0 0 713.52289 0 56 387.58195 713.52289 84.1% - 21s
0 0 711.85869 0 56 387.58195 711.85869 83.7% - 23s
0 0 709.49146 0 56 387.58195 709.49146 83.1% - 26s
0 2 709.49146 0 56 387.58195 709.49146 83.1% - 38s
1 3 668.85054 1 53 387.58195 706.73684 82.3% 7776 44s
2 4 644.85066 2 49 387.58195 706.73684 82.3% 5717 47s
3 4 573.72360 3 49 387.58195 706.73684 82.3% 5537 50s
10 3 cutoff 6 387.58195 568.97759 46.8% 2170 55s
Cutting planes:
Implied bound: 829
Clique: 6
MIR: 535
Flow cover: 539
Zero half: 1
Explored 22 nodes (54877 simplex iterations) in 58.67 seconds
Thread count was 60 (of 264 available processors)
Optimal solution found (tolerance 1.00e-04)
Best objective 3.875819472905e+02, best bound 3.875819472905e+02, gap 0.0%
#!/usr/bin/python
# Run distributed MIP. The current directory must contain 'gurobi_cl',
# 'grb_rs', and 'grb_rsw'. The command must include a 'DistribtedMIPJobs='
# argument.
import os
import sys
import subprocess
if len(sys.argv) < 3:
print("Arguments: distributed.py DistributedMIPJobs=# [arguments] model")
exit(0)
# Parse job count
jobs = -1
tunejobs = -1
args = ''
for arg in sys.argv[1:-1]:
onearg = arg.lower()
if onearg.find("distributedmipjobs") == 0:
jobs = int(onearg[19:])
elif onearg.find("tunejobs") == 0:
tunejobs = int(onearg[9:])
elif onearg.find("workerpool") == 0:
print("WorkerPool= argument not allowed")
exit(0)
else:
args += arg + ' '
args += sys.argv[-1]
if jobs < 1 and tunejobs < 1:
print("Invalid job count")
exit(0)
# Launch SLURM job
path = os.path.dirname(os.path.realpath(__file__))
if jobs > 0:
sub = path + '/sub/slurm_distmip.py'
command = 'salloc -J distributed -N ' + str(jobs) + ' ' + sub + ' ' + args
elif tunejobs > 0:
sub = path + '/sub/slurm_disttune.py'
command = 'salloc -J disttune -N ' + str(tunejobs) + ' ' + sub + ' ' + args
subprocess.call(command, shell=True)
#!/usr/bin/python
# Subroutine for 'distributed.py'.
import os
import sys
import subprocess
args = ''
for arg in sys.argv[1:]:
args += arg + ' '
hostnames = os.popen("srun -D ~ -l /bin/hostname | sort | awk '{print $2}'").read()
machines = hostnames.split()
pool = str(machines).strip("[]").replace("'", "").replace(", ", ",")
if os.path.isfile('grb_rs'):
files = ["./grb_rs", "./grb_rsw"]
else:
files = [subprocess.check_output('which grb_rs', shell=True).strip('\n'), \
subprocess.check_output('which grb_rsw', shell=True).strip('\n')]
for machine in machines:
dir = 'slurm/' + os.popen('date -u +%F-%s').read().strip('\n')
command = 'ssh ' + machine + ' mkdir -p ' + dir
os.system(command)
for f in files:
os.system('scp -C ' + f + ' ' + machine + ':' + dir)
command = 'ssh ' + machine + ' ' + dir + '/grb_rs -s'
os.system(command)
command = 'ssh ' + machine + ' ' + dir + '/grb_rs'
os.system(command)
command = 'gurobi_cl workerpool=' + pool + ' DistributedMIPJobs=' + str(len(machines)) + ' ' + args
print command
os.system(command)
for machine in machines:
os.system('ssh ' + machine + ' killall grb_rs')
os.system('ssh ' + machine + ' rm -rf ' + dir)
#!/usr/bin/python
# Subroutine for 'distributed.py'.
import os
import sys
import subprocess
args = ''
for arg in sys.argv[1:]:
args += arg + ' '
hostnames = os.popen("srun -D ~ -l /bin/hostname | sort | awk '{print $2}'").read()
machines = hostnames.split()
pool = str(machines).strip("[]").replace("'", "").replace(", ", ",")
if os.path.isfile('grb_rs'):
files = ["./grb_rs", "./grb_rsw"]
else:
files = [subprocess.check_output('which grb_rs', shell=True).strip('\n'), \
subprocess.check_output('which grb_rsw', shell=True).strip('\n')]
for machine in machines:
dir = 'slurm/' + os.popen('date -u +%F-%s').read().strip('\n')
command = 'ssh ' + machine + ' mkdir -p ' + dir
os.system(command)
for f in files:
os.system('scp -C ' + f + ' ' + machine + ':' + dir)
command = 'ssh ' + machine + ' ' + dir + '/grb_rs -s'
os.system(command)
command = 'ssh ' + machine + ' ' + dir + '/grb_rs'
os.system(command)
command = 'grbtune workerpool=' + pool + ' TuneJobs=' + str(len(machines)) + ' ' + args
print command
os.system(command)
for machine in machines:
os.system('ssh ' + machine + ' killall grb_rs')
os.system('ssh ' + machine + ' rm -rf ' + dir)