Parameterized policies

51 views
Skip to first unread message

Warren Powell

unread,
Jan 12, 2023, 6:49:10 PM1/12/23
to Reinforcement Learning Mailing List

The #reinforcementlearning literature repeatedly talks about the “policy gradient method” using the context of “stochastic” (Boltzmann) policies which have a scalar, tunable parameter.  


Parameterized policies are easily the most important and powerful approach for designing policies for sequential decision problems.  PFAs (parameterized functions), CFAs (parameterized deterministic optimization models) and even deterministic DLAs are three classes of policies that are easily the most widely used approaches for making decisions in practice (see chapter 11 which can be downloaded from https://tinyurl.com/RLandSO/ for an overview of the four classes of policies).


See https://tinyurl.com/cfapolicy/ for an introduction to CFAs and parameterized lookaheads.


In chapter 12 (on PFAs and policy search), I review four general strategies for tuning parameters, drawing on both derivative-based methods (the subject of chapter 5) and derivative-free methods (chapter 7).  For derivative-based methods, I outline numerical derivatives and show how to use the powerful SPSA algorithm for vector-valued tunable parameters.


The fourth strategy is the policy gradient theorem, which I think is by far the least useful.


Time to stop focusing on Bellman’s equation, and recognize the power of parameterized policies from three of the four classes. 



------------------------------
Warren B. Powell
Chief Innovation Officer, Optimal Dynamics
Professor Emeritus, Princeton University
Reply all
Reply to author
Forward
0 new messages