Big performance differences on different hardware and with minor modification

Walter Gisler

unread,

Dec 19, 2022, 2:26:59 PM12/19/22

to or-tools-discuss

Hi all,

The attached implementation of the magic square problem was tested on multiple machines and the performance differences are huge. We are speaking of differences of 100x or more on machines that would have a comparable performance on other algorithms or with other solvers. We tried different seeds to make sure we aren't comparing edge cases, but on each machine, the performance is somewhat stable across different seeds. Some examples:

MacBook Pro, M1 CPU: 3 seconds

MacBook Pro, i7 CPU: 13 seconds

26 Core Xeon CPU @ 2.6 GHZ, Windows: 100 seconds

i5 CPU, 4 Cores with Hyperthreading, Linux: 1600 seconds

Different versions of OR-Tools were compared, which didn't seem to explain the difference. In fact, an older version (7.7) performed considerably better on the Windows environment than the newest version (9.5.2234).

Is CpSolver performing differently on different operating systems? Is it possibly invoking other solvers that are installed on the machine (Cplex, Gurobi), without me noticing and could this explain the performance difference?

I also noticed that on the Mac M1 machine "lb_tree_search" was used, but it wasn't used on any of the other installations. When I tried enforcing it on the other systems (solver.parameters.subsolvers.append("lb_tree_search")), it wouldn't change anything.

Another very strange thing happened when for the following equation the variables are added in a different order:

Before:

model.AddAllDifferent([x[i][j] for j in range(n) for i in range(n)])

After:

model.AddAllDifferent([x[i][j] for i in range(n) for j in range(n)])

All of a sudden the problem solves within seconds on the two "slow" machines.

I have seen with some commercial MIP solvers that the order in which variables and constraints are added can have a significant impact, but I have never seen such a minor change cause a speedup of 100x. Is there anything that can be done to improve the robustness of CpSolver and make it depend less on the order in which constraints/ variables are added?

06_MagicSquare.py

Laurent Perron

unread,

Dec 19, 2022, 5:03:52 PM12/19/22

to or-tools...@googlegroups.com

First,

you cannot conclude on one problem, especially one without an objective. Finding a solution is just luck. It has no statistical value.

About variability, it is true for all solvers. The fact is that with MIP solvers, you usually have an objective with minimize the luck effect as you need to prove optimality, which is more work than luck.

Finally, we do work a lot on search and performance. So we expect to find big performance differences across versions, hopefully all in the right direction.

Note: as your problem does not have an objective, lb_tree_search has been disabled in 9,5. It only works on the objective.

In 9.6, we will release lb_objective_search which is another worker dedicated to improving the lower bound of the objective (when minimizing).

Laurent Perron | Operations Research | lpe...@google.com | (33) 1 42 68 53 00

--
You received this message because you are subscribed to the Google Groups "or-tools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to or-tools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/or-tools-discuss/f0538321-36ee-4405-a674-5f66d662416dn%40googlegroups.com.

Walter Gisler

unread,

Dec 20, 2022, 11:48:21 AM12/20/22

to or-tools-discuss

Hi Laurent,

Thank you for your reply.

I tested with some objectives to break the symmetry.

minimising x[0][0] brought a 3x speedup on the slowest hardware (using 9.5.2237).

minimising sum(x[i][j]*(n*I+j) for i in range(n) for j in range(n)) brought an even bigger speedup. Down to 12 seconds from more than 100 seconds on my Windows Setup (using 9.5.2237).

Later, I tested the effect of different versions. I did multiple runs with different seeds. 9.5.2237 is consistently a lot slower than 9.1.9490 on this problem, both with and without an objective. The following tests were run on my Macbook with i7 CPU, the only thing that was changed was the OR-Tools version:

It takes 13 seconds to solve the original problem with version 9.1.9490 and it takes consistently more than 150 seconds (more than 10x longer) with the newest available version 9.5.2237
If I add an objective to break the symmetry, it takes only 0.7 seconds on my Macbook with version 9.1.9490 but still above 25 seconds with version 9.5.2237

I understand that a newer version might be slower on some instances, but a factor of >10 seemed a bit extreme. Maybe this example is interesting for the development. Curious to gain more insight.

Laurent Perron

unread,

Dec 20, 2022, 4:26:34 PM12/20/22

to or-tools...@googlegroups.com

can you send me the latest code (with the objective), or the proto of the model?

Laurent Perron | Operations Research | lpe...@google.com | (33) 1 42 68 53 00

To view this discussion on the web visit https://groups.google.com/d/msgid/or-tools-discuss/1a335e83-aee0-4574-8361-dad564b7051dn%40googlegroups.com.

Walter Gisler

unread,

Dec 20, 2022, 4:40:06 PM12/20/22

to or-tools-discuss

There you go. The two objectives are commented out right now. You find them on lines 32 and 33.

06_MagicSquare.py

grego...@gmail.com

unread,

Dec 21, 2022, 5:07:55 AM12/21/22

to or-tools-discuss

Hi Walter,

on threadripper 1950x I get 60s on both linux and windows.

I already talked with or-tools team about performance differences between OSes but no one including me put enough effort to find a reasons. Various used libraries can work differently, os scheduler ... but my results were consistent, linux was fastest and windows slowest. I tried also to compile or-tools on windows with clang but there was no much difference compared to microsoft compiler.

Order of variables and constraints has for sure big impact on performance of cp-sat solver. Can you put your model in proto file and execute it on different computers ? Macbook can be faster due to different lucky order of variables and constraints in proto file.

I was never succesful with influencing of a solver with right order of variables except implicit order during creation of a model. Maybe it is too complicated with original model, model after presolve, sat model. But I found a place where variables not used so far in a model are without any logic processed in given order. Definitely a place with big impact. I plan to tweak for a problem with mostly known right order of variables.

Jan

grego...@gmail.com

unread,

Dec 21, 2022, 6:16:41 PM12/21/22

to or-tools-discuss

I tried it on much faster amd 7950x, on windows 11 a solution was found typically around 40 seconds compared to 13 seconds on linux. A reason for this can be unlucky speed of parallel variants that probably communicate in some way or some performance problem on windows because speed difference between cpus doesn’t match computation time.

Jan

Walter Gisler

unread,

Dec 23, 2022, 12:27:07 PM12/23/22

to or-tools-discuss

Hi Jan,

Thank you for your comments and for spending the time on this. I can't easily compare across different OSs with the same hardware, so this is useful feedback.

The example I provided just seems to be particularly unfortunate, I don't have such huge differences with other models. Lots of symmetry etc. A real life example would typically not be like this.