Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
Speaker: Liyu Chen (
https://lchenat.github.io/)
Time: 02/26/21 11:45am PST
Location:
https://usc.zoom.us/j/94386654763Abstract:
We study the stochastic shortest path problem with adversarial costs and known transition, and show that the minimax regret is O(\sqrt{DTK}) and O(\sqrt{DTSAK}) for the full-information setting and the bandit feedback setting respectively, where D is the diameter, T is the expected hitting time of the optimal policy, S is the number of states, A is the number of actions, and K is the number of episodes. Our results significantly improve upon the recent work of (Rosenberg and Mansour, 2020) which only considers the full-information setting and achieves suboptimal regret. Our work is also the first to consider bandit feedback with adversarial costs. (
https://arxiv.org/abs/2012.04053)