Dear all,
We have released the OBP package version 0.4.1: https://github.com/st-tech/zr-obp/releases/tag/0.4.1
The changes are summarized below:
Add some functions to implement OPE for the slate contextual bandit setting [1]
Make `OffPolicyEvaluation` class more useful
- add a method to visualize and compare OPE results of several different policies (#103)
- Enable to use different `estimated_rewards_by_reg_model` values (this will make MRDR [2] easier to use with obp, #92)
Fix some bugs and Refactoring
Welcome new contributors (#94)
references
- [1] James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Benjamin Carterette. 2020. Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions. In Proceedings of the 26th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining. 1779–1788. - [2] Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh. 2018. More robust doubly robust off-policy evaluation. In Proceedings of the 35th International Conference on Machine Learning, PMLR 80, 1447–1456.
Please update your package accordingly. We continue to improve and expand the software; stay tuned!
Best Regards,
Open Bandit Project Team