Hi!
I'm interested in using multi-armed bandits (MABs) for things like ALNS
and I figured a good place to start looking, besides in the literature,
was the or-tools source to see if MABs are used anywhere.
Turn out that in the routing solver there's a parameter to use MABs for
compound operators
bool use_multi_armed_bandit_concatenate_operators = 41;
I've searched to see if there was anything about the impact of using
this when running the routing solver -- judging by the fact that this is
false by default, I'm guessing it was found to not be beneficial "in
general".
On the CP-SAT side I found some references to bandit algorithms but
it looks like MABs are not used in the CP-SAT solver, but there are some
comments and things like
// Uses UCB1 algorithm to compute the score (Multi armed bandit problem).
// Details are at
//
https://lilianweng.github.io/lil-log/2018/01/23/the-multi-armed-bandit-problem-and-its-solutions.html.
// 'total_num_calls' should be the sum of calls across all generators part of
// the multi armed bandit problem.
// If the generator is called less than 10 times then the method returns
// infinity as score in order to get more data about the generator
// performance.
double GetUCBScore(int64_t total_num_calls) const;
It looks like the UCB path was implemented but never wired up. Curious
if this was tried but abandoned because it didn't pay off.
I've seen some papers indicating MABs could be useful in LNS, perhaps
even more so using them on deciding how/what to destroy.
Given that I saw some of these tid-bits in the source code,
I'm curious -- how much has this been explored by the OR-tools
developers/wider community and are there any interesting take-aways?
As always, many thanks for open sourcing these terrific algorithms <3
/Erik