Parallelism and multi threads

Frédéric Girod

unread,

Mar 4, 2024, 6:35:09 AM3/4/24

to or-tools-discuss

Dear all,
Sorry for a stupid question, but I have read all the topics related to parallelism and multi threads in this forum, and can not find a suitable solution for my problem. And I don't know anything to scheduling optimization.

My problem is a simple combinatorial optimization with various constraints. Each full optimization takes ~6 minutes because I have to run the solver 144 times.
But I'm asked to run 50'000 of those simulations, which would take a couple of weeks.

I already set up solver.parameters.num_search_workers = 6, which speeds things up, but it is quite insufficient to run my 50'000 simulations.
I'm using CP SAT, and in order to get parallelism, I tried clumsily to open multiple prompts and run my code on various instances but each instance gets slow down.
I have read that I can not use SetNumThreads, so I'm quite disarmed now, and I have no clue how to speed my code up now.
Thanks for your tips.

blind.lin...@gmail.com

unread,

Mar 4, 2024, 10:21:33 AM3/4/24

to or-tools...@googlegroups.com

The first question I have is what kind of machine are you running on? What CPU exists, with how many cores and threads, and how much RAM.

Why is num_search_workers = 6 being chosen?

If you really have just 6 threads on your CPU, then running multiple jobs will just slow everything down.

Is there any chance to spin up virtual machines in somebody's cloud (Google/AWS/Azure/etc)?

I usually try to get as fast a machine as possible (currently my workstation is 16 cores, 32 threads AMD ryzen thing) with sufficient RAM, which in turn makes processing one job as fast as possible. If I have multiple jobs, then they get lined up sequentially, and my fast machine does them as fast as it can. I found using 32 workers per task is much faster--for my work, ymmv--than splitting them into, say, 4 jobs with 8 workers each.

Others I'm sure have more experience running on cloud machines with essentially unlimited CPUs.

James

-- You received this message because you are subscribed to the Google Groups "or-tools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to or-tools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/or-tools-discuss/e5ef5087-ff00-4eac-8fea-f432c7d0d2d7n%40googlegroups.com.

--

James E. Marca  
Activimetrics LLC

Frédéric Girod

unread,

Mar 4, 2024, 10:43:46 AM3/4/24

to or-tools-discuss

Many thanks James.
My company provides with VMs hosted in Azure, but it turns out that it didn't help, running time wise, even upscaling the service to a maximum.
So I rely on my MacBook Air M2, I guess 8 cores CPU.
It seemed to me that num_search_workers = 6 was a good setup after running a few tests but I have no clue whether this matches my laptop configuration.
Sorry for the question, what's the difference between a task and a job when you write "using 32 workers per task is much faster--for my work, ymmv--than splitting them into, say, 4 jobs with 8 workers each " ?
In addition to the number of workers in CP SAT, I don't know how to split my process into different "tasks" running in parallel ....

blind.lin...@gmail.com

unread,

Mar 4, 2024, 1:21:00 PM3/4/24

to or-tools...@googlegroups.com

By task I meant complete job. So suppose I want to try out 3 different parameter settings. I would run the three jobs sequentially. not in parallel, and would use all available cores. I let CPSAT use the parallelism to its best advantage.

So on mine I usually use 0 workers. Setting num_search_workers=0 means use all available, but that may be a problem on a virtual machine, so you might want to set it to 32.

If I know in advance how to break up a problem...for example, partition some tasks/resources into one run, and the others into another because they are mostly independent...then I break up the problem manually/programmatically, run job A, then run job B.

If I had a server with, say, 128 threads, I would just use python to launch 4 solver process runs in parallel with completely different calling rules, and then explicitly set search workers to 32 each, so that the concurrent jobs do not get into fights over using CPUs.

I do that with the routing solver more often, because it is single-threaded. So, for example, if I have routing jobs for a client in Pittsburgh, Philadelphia, and Peoria, then I'll set up three jobs with the relevant calling parameters along with a worker pool in python, collect the results, then report results when all the jobs are done.

James

To view this discussion on the web visit https://groups.google.com/d/msgid/or-tools-discuss/a2c05482-d6b4-497a-a56a-c722e0114428n%40googlegroups.com.

Frédéric Girod

unread,

Mar 5, 2024, 10:19:55 AM3/5/24

to or-tools-discuss

Thanks James.

I'm still having issues to navigate all those considerations.

What I can just say is that the best parameter is num_search_workers=4 , speed wise.

With num_search_workers=0 , it is way slower for one single optimization.

I tried to run 4 jobs, for example with 2 workers, but each instance runs slower.

I'm still having hard time to figure out how CPSAT uses the parallelism to its best advantage ....

Laurent Perron

unread,

Mar 5, 2024, 10:24:28 AM3/5/24

to or-tools...@googlegroups.com

https://github.com/google/or-tools/blob/stable/ortools/sat/docs/troubleshooting.md#improving-performance-with-multiple-workers

Laurent Perron | Operations Research | lpe...@google.com | (33) 1 42 68 53 00

To view this discussion on the web visit https://groups.google.com/d/msgid/or-tools-discuss/b8e26c67-283c-4357-bc0d-adbd2a98f188n%40googlegroups.com.

Reply all

Reply to author

Forward