Daily TMLR digest for Mar 15, 2023

1 view

Skip to first unread message

TMLR

unread,

Mar 14, 2023, 8:00:09 PM3/14/23

to tmlr-anno...@googlegroups.com

New certifications
==================

Reproducibility Certification: PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets

Shuo Sun, Molei Qin, Xinrun Wang, Bo An

https://openreview.net/forum?id=JjbsIYOuNi

---

Accepted papers
===============

Title: PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets

Authors: Shuo Sun, Molei Qin, Xinrun Wang, Bo An

Abstract: The financial markets, which involve more than $90 trillion market capitals, attract the attention of innumerable investors around the world. Recently, reinforcement learning in financial markets (FinRL) has emerged as a promising direction to train agents for making profitable investment decisions. However, the evaluation of most FinRL methods only focuses on profit-related measures and ignores many critical axes, which are far from satisfactory for financial practitioners to deploy these methods into real-world financial markets. Therefore, we introduce PRUDEX-Compass, which has 6 axes, i.e., Profitability, Risk-control, Universality, Diversity, rEliability, and eXplainability, with a total of 17 measures for a systematic evaluation. Specifically, i) we propose AlphaMix+ as a strong FinRL baseline, which leverages mixture-of-experts (MoE) and risk-sensitive approaches to make diversified risk-aware investment decisions, ii) we evaluate 8 FinRL methods in 4 long-term real-world datasets of influential financial markets to demonstrate the usage of our PRUDEX-Compass, iii) PRUDEX-Compass together with 4 real-world datasets, standard implementation of 8 FinRL methods and a portfolio management environment is released as public resources to facilitate the design and comparison of new FinRL methods. We hope that PRUDEX-Compass can not only shed light on future FinRL research to prevent untrustworthy results from stagnating FinRL into successful industry deployment but also provide a new challenging algorithm evaluation scenario for the reinforcement learning (RL) community.

URL: https://openreview.net/forum?id=JjbsIYOuNi

---

Title: A Unified View of Masked Image Modeling

Authors: Zhiliang Peng, Li Dong, Hangbo Bao, Furu Wei, Qixiang Ye

Abstract: Masked image modeling has demonstrated great potential to eliminate the label-hungry problem of training large-scale vision Transformers, achieving impressive performance on various downstream tasks. In this work, we propose a unified view of masked image modeling after revisiting existing methods. Under the unified view, we introduce a simple yet effective method, termed as MaskDistill, which reconstructs normalized semantic features from teacher models at the masked positions, conditioning on corrupted input images. Experimental results on image classification and semantic segmentation show that MaskDistill achieves comparable or superior performance than state-of-the-art methods. When using the huge vision Transformer and pretraining 300 epochs, MaskDistill obtains 88.3% fine-tuning top-1 accuracy on ImageNet-1k (224 size) and 58.8 semantic segmentation mIoU metric on ADE20k (512 size). Code is enclosed in the supplementary materials.

URL: https://openreview.net/forum?id=wmGlMhaBe0

---

Title: How Robust is Your Fairness? Evaluating and Sustaining Fairness under Unseen Distribution Shifts

Authors: Haotao Wang, Junyuan Hong, Jiayu Zhou, Zhangyang Wang

Abstract: Increasing concerns have been raised on deep learning fairness in recent years. Existing fairness-aware machine learning methods mainly focus on the fairness of in-distribution data. However, in real-world applications, it is common to have distribution shift between the training and test data. In this paper, we first show that the fairness achieved by existing methods can be easily broken by slight distribution shifts. To solve this problem, we propose a novel fairness learning method termed CUrvature MAtching (CUMA), which can achieve robust fairness generalizable to unseen domains with unknown distributional shifts. Specifically, CUMA enforces the model to have similar generalization ability on the majority and minority groups, by matching the loss curvature distributions of the two groups. We evaluate our method on three popular fairness datasets. Compared with existing methods, CUMA achieves superior fairness under unseen distribution shifts, without sacrificing either the overall accuracy or the in-distribution fairness.

URL: https://openreview.net/forum?id=11pGlecTz2

---

New submissions
===============

Title: Exploiting Latent Properties to Optimize Neural Codecs

Abstract: End-to-end image/video codecs are getting competitive compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques such as easy adaptation on perceptual distortion metrics and high performance on specific domains thanks to their learning ability. However, state of the art neural codecs do not take advantage of vector quantization technique and existence of gradient of entropy in decoding device. In this research, we propose some theoretical insights about these two properties (quantization and entropy gradient), and show that this can improve the performances of many off-the-shelf codecs. First, we prove that non-uniform quantization map on neural codec’s latent is not necessary. Thus, we improve the performance by using a predefined optimal uniform vector quantization map. Secondly, we theoretically show that gradient of entropy (available at decoder side) is correlated with the gradient of the reconstruction error (which is not available at decoder side). Thus, we use the former as a proxy in order to improve the compression performance. According to our results, we save between 2-4% of rate for the same quality with this proposal, for various pre-trained methods.

URL: https://openreview.net/forum?id=Sv0FWYkQgh

---

Title: Dynamic Subgoal-based Exploration via Bayesian Optimization

Abstract: Policy optimization in unknown, sparse-reward environments with expensive and limited interactions is challenging, and poses a need for effective exploration. Motivated by complex navigation tasks that require real-world training (when cheap simulators are not available), we consider an agent that faces an unknown distribution of environments and must decide on an exploration strategy, through a series of training environments, that can benefit policy learning in a test environment drawn from the environment distribution. Most existing approaches focus on fixed exploration strategies, while the few that view exploration as a meta-optimization problem tend to ignore the need for cost-efficient exploration. We propose a cost-aware Bayesian optimization (BO) approach that efficiently searches over a class of dynamic subgoal-based exploration strategies. The algorithm adjusts a variety of levers --- the locations of the subgoals, the length of each episode, and the number of replications per trial --- in order to overcome the challenges of sparse rewards, expensive interactions, and noise. Our experimental evaluation demonstrates that, when averaged across problem domains, the proposed algorithm outperforms the meta-learning algorithm MAML by 19%, the hyperparameter tuning method Hyperband by 23%, and BO techniques EI and LCB by 24% and 22%, respectively. We also provide a theoretical foundation and prove that the method asymptotically identifies a near-optimal subgoal design from the search space.

URL: https://openreview.net/forum?id=ThJl4d5JRg

---

Title: Two-Stage Neural Contextual Bandits for Adaptive Personalised Recommendations

Abstract: We consider the problem of personalised recommendations where each user consumes recommendations in a sequential fashion. Personalised recommendation methods that focus on exploiting user interests but ignore exploration will result in biased feedback loops, which hurt recommendation quality in the long term. In this paper, we consider contextual bandits based strategies to address the exploitation-exploration trade-off for large-scale adaptive personalised recommendation systems. In a large-scale system where the number of items is exponentially large, addressing the exploitation-exploration trade-off becomes significantly more challenging that renders most existing standard contextual bandit algorithms inefficient. To systematically address this challenge, we propose a hierarchical neural contextual bandit framework to efficiently learn user preferences. Our hierarchical structure first explores dynamic topics before recommending a set of items. We leverage neural networks to learn non-linear representations of users and items, and use upper confidence bounds (UCBs) as the basis for item recommendation. We propose an additive linear and a bilinear structure for UCB, where the former captures the representation uncertainties of users and items separately while the latter additionally captures the uncertainty of the user-item interaction. We show that our hierarchical framework with our proposed bandit policies exhibits strong computational and performance advantages compared to many standard bandit baselines on two large-scale standard recommendation benchmark datasets.

URL: https://openreview.net/forum?id=6lDmsAHCNo

---

Title: Aux-Drop: Handling Haphazard Inputs in Online Learning Using Auxiliary Dropouts

Abstract: Many real-world applications based on online learning produce streaming data that is haphazard in nature, i.e., contains missing features, features becoming obsolete in time, the appearance of new features at later points in time and a lack of clarity on the total number of input features. These challenges make it hard to build a learnable system for such applications, and almost no work exists in deep learning that addresses this issue. In this paper, we present Aux-Drop, an auxiliary dropout regularization strategy for online learning that handles the haphazard input features in an effective manner. Aux-Drop adapts the conventional dropout regularization scheme for the haphazard input feature space ensuring that the final output is minimally impacted by the chaotic appearance of such features. It helps to prevent the co-adaptation of especially the auxiliary and base features, as well as reduces the strong dependence of the output on any of the auxiliary inputs of the model. This helps in better learning for scenarios where certain features disappear in time or when new features are to be modeled. The efficacy of Aux-Drop has been demonstrated through extensive numerical experiments on SOTA benchmarking datasets that include Italy Power Demand, HIGGS, SUSY and multiple UCI datasets.

URL: https://openreview.net/forum?id=R9CgBkeZ6Z

---

Reply all

Reply to author

Forward

0 new messages