Proceedings of the 40th International Conference on Machine Learning
, PMLR 202:21611-21630, 2023.
Abstract
Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints.
Cite this Paper
@InProceedings{pmlr-v202-liu23m,
title = {Constrained Decision Transformer for Offline Safe Reinforcement Learning},
author = {Liu, Zuxin and Guo, Zijian and Yao, Yihang and Cen, Zhepeng and Yu, Wenhao and Zhang, Tingnan and Zhao, Ding},
booktitle = {Proceedings of the 40th International Conference on Machine Learning},
pages = {21611--21630},
year = {2023},
editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
volume = {202},
series = {Proceedings of Machine Learning Research},
month = {23--29 Jul},
publisher = {PMLR},
pdf = {https://proceedings.mlr.press/v202/liu23m/liu23m.pdf},
url = {https://proceedings.mlr.press/v202/liu23m.html},
abstract = {Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints.}
%0 Conference Paper
%T Constrained Decision Transformer for Offline Safe Reinforcement Learning
%A Zuxin Liu
%A Zijian Guo
%A Yihang Yao
%A Zhepeng Cen
%A Wenhao Yu
%A Tingnan Zhang
%A Ding Zhao
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett
%F pmlr-v202-liu23m
%I PMLR
%P 21611--21630
%U https://proceedings.mlr.press/v202/liu23m.html
%V 202
%X Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints.
Liu, Z., Guo, Z., Yao, Y., Cen, Z., Yu, W., Zhang, T. & Zhao, D.. (2023). Constrained Decision Transformer for Offline Safe Reinforcement Learning.
Proceedings of the 40th International Conference on Machine Learning
, in
Proceedings of Machine Learning Research
202:21611-21630 Available from https://proceedings.mlr.press/v202/liu23m.html.