OBP Package v0.5.2 released

236 views

Skip to first unread message

unread,

Jan 13, 2022, 8:03:36 AM1/13/22

to Open Bandit Project

Dear all,

The changes are summarized below:

Updates

Implement `obp.policy.QLearner` (https://github.com/st-tech/zr-obp/pull/144 )
Implement the Balanced IPW Estimator as `obp.ope.BalancedInverseProbabilityWeighting`. See Sondhi et al.(2020) for details. (https://github.com/st-tech/zr-obp/pull/146 ).
Implement the Cascade Doubly Robust estimator for the combinatorial action OPE as `obp.ope.CascadeDR`. See Kiyohara et al.(2022) for details. (https://github.com/st-tech/zr-obp/pull/142 )
Implement a data-driven hyperparameter tuning method for OPE called SLOPE proposed by Su et al.(2020) and Tucker et al.(2021) (https://github.com/st-tech/zr-obp/pull/148 )
Implement new estimators for the standard OPE based on a power-mean transformation of importance weights proposed by Metelli et al.(2021) (https://github.com/st-tech/zr-obp/pull/149 )
Implement dataset class for generating synthetic logged bandit data with multiple loggers. Corresponding estimators will be added in the next update (https://github.com/st-tech/zr-obp/pull/150 )
Implement an argument to control the number of deficient actions in `obp.dataset.SyntheticBanditDataset` and `obp.dataset.MultiClassToBanditReduction`. See Sachdeva et al.(2020) for details. (https://github.com/st-tech/zr-obp/pull/150 )
Implement some flexible functions to synthesize reward function and behavior policy (https://github.com/st-tech/zr-obp/pull/145 )

References

Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, and Yasuo Yamamoto. Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model. WSDM2022.
Arjun Sondhi, David Arbour, Drew Dimmery. Balanced Off-Policy Evaluation in General Action Spaces. AISTATS2020.
Yi Su, Pavithra Srinath, Akshay Krishnamurthy. Adaptive Estimator Selection for Off-Policy Evaluation. ICML2020.
George Tucker and Jonathan Lee. Improved Estimator Selection for Off-Policy Evaluation. ICML2021 Workshop.
Alberto Maria Metelli, Alessio Russo, Marcello Restelli. Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning. NeurIPS2021.
Noveen Sachdeva, Yi Su, and Thorsten Joachims. "Off-policy Bandits with Deficient Support.", KDD2020.
Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims. "Effective Evaluation using Logged Bandit Feedback from Multiple Loggers.", KDD2018.
Nathan Kallus, Yuta Saito, and Masatoshi Uehara. "Optimal Off-Policy Evaluation from Multiple Logging Policies.", ICML2021.

Please update your package accordingly. We continue to improve and expand the software, stay tuned!