Constrained Decision Transformer for Offline Safe Reinforcement Learning

Zuxin Liu, Zijian Guo, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, Ding Zhao

Proceedings of the 40th International Conference on Machine Learning , PMLR 202:21611-21630, 2023.

Abstract

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints.

Cite this Paper

@InProceedings{pmlr-v202-liu23m, title = {Constrained Decision Transformer for Offline Safe Reinforcement Learning}, author = {Liu, Zuxin and Guo, Zijian and Yao, Yihang and Cen, Zhepeng and Yu, Wenhao and Zhang, Tingnan and Zhao, Ding}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {21611--21630}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/liu23m/liu23m.pdf}, url = {https://proceedings.mlr.press/v202/liu23m.html}, abstract = {Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints.}


    %0 Conference Paper
%T Constrained Decision Transformer for Offline Safe Reinforcement Learning
%A Zuxin Liu
%A Zijian Guo
%A Yihang Yao
%A Zhepeng Cen
%A Wenhao Yu
%A Tingnan Zhang
%A Ding Zhao
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-liu23m
%I PMLR
%P 21611--21630
%U https://proceedings.mlr.press/v202/liu23m.html
%V 202
%X Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints.
    
    
    Liu, Z., Guo, Z., Yao, Y., Cen, Z., Yu, W., Zhang, T. & Zhao, D.. (2023). Constrained Decision Transformer for Offline Safe Reinforcement Learning.
    
     Proceedings of the 40th International Conference on Machine Learning
    
    , in
    
     Proceedings of Machine Learning Research
    
    202:21611-21630 Available from https://proceedings.mlr.press/v202/liu23m.html.