Running multiple Gurobi optimizations in parallel unexpectedly slow?!?

936 views
Skip to first unread message

AsToN

unread,
Nov 7, 2017, 5:24:01 AM11/7/17
to Gurobi Optimization

Hey guys,

 

I am using Gurobi's Matlab Class API to optimize multiple integer problems in parallel. The problem data is similar except one vector that changes.

 

When I am using a parallel loop (parfor) to run separate Gurobi optimizations on the 4 workers of my machine I experience a strange behaviour. The runtime Gurobi needs to retrieve the solutions is significantly longer when the problems are run in parallel than what they are when they are run sequentially (and we are talking of up to 100% longer). When I am comparing the parallel and the sequential run the input data is completely identical and Gurobi also reports same results using the same solution path (according to the log file).

 

I am pretty sure it is not because of oversubcription of threads (I have 4 cores with 2 threads each and thus I am using 4 workers and Gurobi is limited to use 2 threads --> should be fine I guess) or because the transmitting of the data needs too long. I already checked that and the data transmission times are negligible. Plus the runtime is taken from Gurobi itself, so shouldn't be affected by any such issues.

 

I experience the same behaviour with CPLEX.

 

Any ideas why this can happen?

Tobias Achterberg

unread,
Nov 7, 2017, 8:11:24 AM11/7/17
to gur...@googlegroups.com
Hyper-threads share some of the cache levels in a core, they increase heat (and
thereby may implicitly reduce the CPU clock rate), and all threads assigned to
one CPU share the memory memory bus. For most problems, the main bottleneck for
a MIP solve is the memory bus (and not the CPU). For this reason, it is often
not useful to employ parallelism with hyper-threading.

In your case, I suggest to use your parfor loop as usual with 4 parallel jobs,
but limit Gurobi to only use a single thread.

Note also that going from one to two threads involves some overhead due to the
management of parallel threads and data synchronization. If this second thread
is actually a hyper-thread on the same core, I would rather go with the
sequential code.


Regards,

Tobias

AsToN

unread,
Nov 7, 2017, 9:23:02 AM11/7/17
to Gurobi Optimization
Hi Tobias,

I do understand that using 2 threads instead of 1 doesn't imply a proportional speedup. However, in my case whenever I run a single problem using 2 threads then this is faster than the same solve using only one thread (by about 40%, so it's definitely interesting). This is why I want to use each optimization process to use 2 threads.

Now when I run multiple optimizations in parallel (in my case 4), each getting again 2 threads, then the solution times per problem are much longer then when I run a single problem which also uses 2 threads (ceteris paribus). And this is strange to me since the problems are completely independent of each other and thus it shouldn't affect per problem solution times if I run 4 problems (= workers/cores) or just one. Because in both cases each Gurobi procedure gets 2 threads.

Regards,
AsToN


Tim Chippington Derrick

unread,
Nov 7, 2017, 4:03:27 PM11/7/17
to gur...@googlegroups.com
Apologies for jumping in, but isn't it kind-of obvious that if you solve a problem on its own on a 4-core processor and allow the solver to use up to two threads, then there will be little contention for resources and it may well end up using two separate physical cores rather than use the hyperthreading. 

Solving four problems at the same time and allowing up to two threads each really is trying to run 8 threads in parallel on what is really a four-core processor, and will definitely have a lot of contention for resources and *will* run slower. I have seen this on several customer sites with different types of CPUs and different solvers including CPLEX, so this is not a Gurobi-specific issue. If possible we would recommend turning off hyperthreading if a PC or server is used a lot for MILP-type solving as it really doesn't seem to help and can actually slow down the processing rather than speed it up; and we would normally try not to fill the CPU even in that case, typically trying to keep a real physical core free for the operating system etc.

Just my 0.02 euros.
Tim

Virus-free. www.avg.com

--

---
You received this message because you are subscribed to the Google Groups "Gurobi Optimization" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gurobi+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Dr Tim Chippington Derrick
Chippington Derrick Consultants Ltd
Tel: +44 01276 508949
Mob: +44 07971 997948

AsToN

unread,
Nov 8, 2017, 3:56:09 AM11/8/17
to Gurobi Optimization
Ok, well the explanation that it might not be the 2 specific threads of one single core but any two threads, possibly from different cores, also came to my mind. I just expected that MATLAB would take care of that. But that was wrong I guess. Anyways thanks for you help!
To unsubscribe from this group and stop receiving emails from it, send an email to gurobi+un...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Tobias Achterberg

unread,
Nov 8, 2017, 5:05:52 AM11/8/17
to gur...@googlegroups.com
This is the issue with hyper-threading. If you run Gurobi with two threads on an unloaded
machine, then the operating system will be smart enough to put the two threads to two
different cores. Thus, each thread has its own cache and runs in isolation on its own core.

But when you have four times two threads, then (in the best case) each Gurobi job uses
both hyper-threads of a single core. Consequently, they share some cache and slow each
other down (due to heating up the core, and because hyper-threading is just not same as
two cores).

In the worst case, you have multiple CPUs (a NUMA architecture), and the two threads of a
Gurobi job are scheduled on different CPUs. Then, there will be lots of memory access from
one CPU to the memory that is closer to the other CPU, which is slower.

If you have a multiple CPU system, then you need to consider using thread affinity to make
sure that the threads are scheduled on the CPUs that you want. But even if you have just a
single CPU (with 8 cores), I am not so sure if it is bad if the two threads of a Gurobi
job are scheduled to two different cores (and another Gurobi job is consuming the other
hyper-threads of the two cores).

Conclusion: If your 4 core 8 hyper-thread system is using only 4 threads in total, then
you won't see the hyper-threading slow down because the operating system is smart enough
with scheduling the threads. But as soon as you use the fifth thread, you will see some
degradation, and this degradation might be bigger than the improvement you get from
additional parallelism.


Regards,

Tobias
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "Gurobi
> Optimization" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> gurobi+un...@googlegroups.com <mailto:gurobi+un...@googlegroups.com>.
Reply all
Reply to author
Forward
0 new messages