Use of bandits in OR-tools

27 views
Skip to first unread message

Erik Cervin-Edin

unread,
Jun 15, 2026, 7:53:48 AM (7 days ago) Jun 15
to or-tools...@googlegroup.com
Hi!

I'm interested in using multi-armed bandits (MABs) for things like ALNS
and I figured a good place to start looking, besides in the literature,
was the or-tools source to see if MABs are used anywhere.

Turn out that in the routing solver there's a parameter to use MABs for
compound operators

bool use_multi_armed_bandit_concatenate_operators = 41;

I've searched to see if there was anything about the impact of using
this when running the routing solver -- judging by the fact that this is
false by default, I'm guessing it was found to not be beneficial "in
general".

On the CP-SAT side I found some references to bandit algorithms but
it looks like MABs are not used in the CP-SAT solver, but there are some
comments and things like

// Uses UCB1 algorithm to compute the score (Multi armed bandit problem).
// Details are at
// https://lilianweng.github.io/lil-log/2018/01/23/the-multi-armed-bandit-problem-and-its-solutions.html.
// 'total_num_calls' should be the sum of calls across all generators part of
// the multi armed bandit problem.
// If the generator is called less than 10 times then the method returns
// infinity as score in order to get more data about the generator
// performance.
double GetUCBScore(int64_t total_num_calls) const;

It looks like the UCB path was implemented but never wired up. Curious
if this was tried but abandoned because it didn't pay off.

I've seen some papers indicating MABs could be useful in LNS, perhaps
even more so using them on deciding how/what to destroy.

Given that I saw some of these tid-bits in the source code,
I'm curious -- how much has this been explored by the OR-tools
developers/wider community and are there any interesting take-aways?

As always, many thanks for open sourcing these terrific algorithms <3

/Erik

Frank

unread,
Jun 15, 2026, 9:12:11 AM (7 days ago) Jun 15
to or-tools-discuss
Thanks Erik, this is super helpful.

I saw `use_multi_armed_bandit` in `routing_enums.proto` but like you said, it’s off by default. The UCB comment in `routing_local_search.cc` + the `GetUCBScore` in CP-SAT explains a lot. 

Have you tested MAB for operator selection vs the default simulated annealing? I’m specifically looking at ALNS for VRP where destroying "what" matters more than "how". My hypothesis is UCB could beat random operator choice once we have enough iterations for it to converge.

Would love to hear if OR-Tools team has any benchmarks or if it died because overhead > gain on typical VRP instances.

Appreciate the pointer to the code paths 🙏

Thanks again for the pointers!
Reply all
Reply to author
Forward
0 new messages