Code for NeurIPS 2022 paper Reinforcement Learning with Automated Auxiliary Loss Search.
This repository is the implementation of A2LS based on the official implementation of CURL for the DeepMind control experiments.
All of the dependencies are in the conda_env.yml
file. They can be installed manually or with the following command:
conda env create -f conda_env.yml
We present some running examples of training RL with auxiliary losses with our code base.
To train a SAC agent with A2-winner
on image-based Cheetah-Run with default hyper-parameters (please refer to appendix for detailed hyper-parameters for each experiment setting):
python train.py \
--domain_name cheetah_run \
--encoder_type pixel \
--agent auxi_sac \
--auxi_pred_horizon 4 \
--auxi_pred_input_s 1000 --auxi_pred_input_a 1111 --auxi_pred_input_r 1101 --auxi_pred_input_s_ 0\
--auxi_pred_output_s 0111 --auxi_pred_output_a 0000 --auxi_pred_output_r 0000 --auxi_pred_output_s_ 1\
--similarity_metric mse
To train a SAC agent with A2-winner
on vector-based Cheetah-Run with default hyper-parameters (please refer to appendix for detailed hyper-parameters for each experiment setting):
python train.py \
--domain_name cheetah_run \
--encoder_type ofe --encoder_hidden_size 40 --num_layers 1 \
--agent auxi_sac \
--auxi_pred_horizon 4 \
--auxi_pred_input_s 1000 --auxi_pred_input_a 1111 --auxi_pred_input_r 1101 --auxi_pred_input_s_ 0\
--auxi_pred_output_s 0111 --auxi_pred_output_a 0000 --auxi_pred_output_r 0000 --auxi_pred_output_s_ 1\
--similarity_metric mse
To train a SAC agent with A2-winner
on image-based Cheetah-Run with default hyper-parameters (please refer to appendix for detailed hyper-parameters for each experiment setting):
python train.py \
--domain_name cheetah_run \
--encoder_type pixel \
--agent auxi_sac \
--auxi_pred_horizon 9 \
--auxi_pred_input_s 101000001 --auxi_pred_input_a 111111011 --auxi_pred_input_r 000110001 --auxi_pred_input_s_ 0\
--auxi_pred_output_s 010100100 --auxi_pred_output_a 000010000 --auxi_pred_output_r 000000000 --auxi_pred_output_s_ 1\
--similarity_metric mse
To train a SAC agent with A2-winner
on vector-based Cheetah-Run with default hyper-parameters (please refer to appendix for detailed hyper-parameters for each experiment setting):
python train.py \
--domain_name cheetah_run \
--encoder_type ofe --encoder_hidden_size 40 --num_layers 1 \
--agent auxi_sac \
--auxi_pred_horizon 9 \
--auxi_pred_input_s 101000001 --auxi_pred_input_a 111111011 --auxi_pred_input_r 000110001 --auxi_pred_input_s_ 0\
--auxi_pred_output_s 010100100 --auxi_pred_output_a 000010000 --auxi_pred_output_r 000000000 --auxi_pred_output_s_ 1\
--similarity_metric mse
To train a baseline SAC agent on image-based Cheetah-Run with default hyper-parameters:
python train.py \
--domain_name cheetah_run \
--encoder_type pixel \
--agent pixel_sac
To train a baseline SAC agent on image-based Cheetah-Run with default hyper-parameters and default architures (MLP):
python train.py \
--domain_name cheetah_run \
--encoder_type mlp --encoder_hidden_size 40 --num_layers 1\
--agent pixel_sac
To train a baseline SAC agent on image-based Cheetah-Run with default hyper-parameters and dense-connected architures (MLP):
python train.py \
--domain_name cheetah_run \
--encoder_type ofe --encoder_hidden_size 40 --num_layers 1\
--agent pixel_sac
To train a baseline SAC agent with CURL
loss on image-based Cheetah-Run with default hyper-parameters:
python train.py \
--domain_name cheetah_run \
--encoder_type pixel \
--agent curl_sac
To train a baseline SAC agent with CURL
loss on vector-based Cheetah-Run with default hyper-parameters and default architures (MLP):
python train.py \
--domain_name cheetah_run \
--encoder_type mlp --encoder_hidden_size 40 --num_layers 1\
--agent curl_sac
To train a baseline SAC agent with CURL
loss on vector-based Cheetah-Run with default hyper-parameters and dense-connected architures (MLP):
python train.py \
--domain_name cheetah_run \
--encoder_type ofe --encoder_hidden_size 40 --num_layers 1\
--agent curl_sac
For GPU accelerated rendering, make sure EGL is installed on your machine and set export MUJOCO_GL=egl
.
For environment troubleshooting issues, see the DeepMind control documentation.
You are more than welcome to cite our paper:
@article{he2022reinforcement,
title={Reinforcement Learning with Automated Auxiliary Loss Search},
author={He, Tairan and Zhang, Yuge and Ren, Kan and Liu, Minghuan and Wang, Che and Zhang, Weinan and Yang, Yuqing and Li, Dongsheng},
journal={Advances in Neural Information Processing Systems},
year={2022}
}