Use of bandits in OR-tools

Erik Cervin-Edin

unread,

Jun 15, 2026, 7:53:48 AMJun 15

to or-tools...@googlegroup.com

Hi!

I'm interested in using multi-armed bandits (MABs) for things like ALNS
and I figured a good place to start looking, besides in the literature,
was the or-tools source to see if MABs are used anywhere.

Turn out that in the routing solver there's a parameter to use MABs for
compound operators

bool use_multi_armed_bandit_concatenate_operators = 41;

I've searched to see if there was anything about the impact of using
this when running the routing solver -- judging by the fact that this is
false by default, I'm guessing it was found to not be beneficial "in
general".

On the CP-SAT side I found some references to bandit algorithms but
it looks like MABs are not used in the CP-SAT solver, but there are some
comments and things like

// Uses UCB1 algorithm to compute the score (Multi armed bandit problem).
// Details are at
// https://lilianweng.github.io/lil-log/2018/01/23/the-multi-armed-bandit-problem-and-its-solutions.html.
// 'total_num_calls' should be the sum of calls across all generators part of
// the multi armed bandit problem.
// If the generator is called less than 10 times then the method returns
// infinity as score in order to get more data about the generator
// performance.
double GetUCBScore(int64_t total_num_calls) const;

It looks like the UCB path was implemented but never wired up. Curious
if this was tried but abandoned because it didn't pay off.

I've seen some papers indicating MABs could be useful in LNS, perhaps
even more so using them on deciding how/what to destroy.

Given that I saw some of these tid-bits in the source code,
I'm curious -- how much has this been explored by the OR-tools
developers/wider community and are there any interesting take-aways?

As always, many thanks for open sourcing these terrific algorithms <3

/Erik

Frank

unread,

Jun 15, 2026, 9:12:11 AMJun 15

to or-tools-discuss

Thanks Erik, this is super helpful.

I saw `use_multi_armed_bandit` in `routing_enums.proto` but like you said, it’s off by default. The UCB comment in `routing_local_search.cc` + the `GetUCBScore` in CP-SAT explains a lot.

Have you tested MAB for operator selection vs the default simulated annealing? I’m specifically looking at ALNS for VRP where destroying "what" matters more than "how". My hypothesis is UCB could beat random operator choice once we have enough iterations for it to converge.

Would love to hear if OR-Tools team has any benchmarks or if it died because overhead > gain on typical VRP instances.

Appreciate the pointer to the code paths 🙏

Thanks again for the pointers!

Laurent Perron

unread,

Jul 1, 2026, 12:04:12 PMJul 1

to or-tools...@googlegroups.com

The biggest issue is that the landscape for bandits changes dramatically between the start of the search where everything works, and then closer to the optimal, every improvement is extremely hard to find and rare.

The second issue is that improvements are rare in general, so it is hard to extract trends from them.

All in all, it is hard to beat round-robin.

Laurent Perron | Operations Research | lpe...@google.com | (33) 1 42 68 53 00

--
You received this message because you are subscribed to the Google Groups "or-tools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to or-tools-discu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/or-tools-discuss/47b446b9-a7ed-4c0f-a0bd-37fa1f1f928en%40googlegroups.com.

Reply all

Reply to author

Forward