We study the problem of offline-to-online reinforcement learning from high-dimensional pixel observations. While recent model-free approaches successfully use offline pre-training with online fine-tuning to either improve the performance of the data-collection policy or adapt to novel tasks, model-based approaches still remain underutilized in this setting. In this work, we argue that existing methods for high-dimensional model-based offline RL are not suitable for offline-to-online fine-tuning due to issues with representation learning shifts, off-dynamics data, and non-stationary rewards. We propose a simple on-policy model-based method with adaptive behavior regularization. In our simulation experiments, we find that our approach successfully solves long-horizon robot manipulation tasks completely from images by using a combination of offline data and online interactions.
In active vibration control, controller tuning is necessary to obtain a sufficient damping performance. This study presents an efficient tuning technique for model-free vibration controller based on a virtual controlled object (VCO). A model-free vibration control system is constructed by using an actuator model and the VCO. A reference controlled object (RCO), which is a designer-defined single-degree-of-freedom (SDOF) structure, is used to tune the VCO-controller offline. A novel parameter tuning technique based on the RCO and the simultaneous perturbation stochastic approximation (SPSA) is proposed considering the difference in scale of the tuning parameters. The proposed tuning scheme automatically determines the tuning parameter of the controller without manual trial-and-errors such as experiments with actual plants and is much faster than previous methods. Therefore, the proposed method significantly enhances the practicability of the VCO-based model-free vibration suppression, bridging the gap between the basic study of the VCO-method and its implementation. The effectiveness of the proposed method is verified by applying it to the VCO-based H2 controller.
Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience. However, learning solely from a static dataset can limit the performance due to the lack of exploration. To overcome it, offline-to-online RL combines offline pre-training with online fine-tuning, which enables the agent to further refine its policy by interacting with the environment in real-time. Despite its benefits, existing offline-to-online RL methods suffer from performance degradation and slow improvement during the online phase. To tackle these challenges, we propose a novel framework called Ensemble-based Offline-to-Online (E2O) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Moreover, to expedite online performance enhancement, we appropriately loosen the pessimism of Q-value estimation and incorporate ensemble-based exploration mechanisms into our framework. Experimental results demonstrate that E2O can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods during online fine-tuning on a range of locomotion and navigation tasks, significantly outperforming existing offline-to-online RL methods.
9738318194