Skip to content

Commit

Permalink
[dev] bootorl paper codes. (#10)
Browse files Browse the repository at this point in the history
* [dev] bootorl paper codes.

* [fix] file structure.

---------

Co-authored-by: Kan Ren <[email protected]>
  • Loading branch information
rk2900 and Kan Ren authored Mar 10, 2023
1 parent b61819c commit ee16183
Show file tree
Hide file tree
Showing 24 changed files with 3,295 additions and 1 deletion.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ This repository contains code for a series of research projects on Automated Rei
## News

* 2022.9.21 [Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble](https://seqml.github.io/eppo/) is now available in [eppo](eppo).
* 2022.10.12 [Reinforcement Learning with Automated Auxiliary Loss Search](https://seqml.github.io/A2LS/) is now available in [a2ls](a2ls).
* 2022.10.12 [Reinforcement Learning with Automated Auxiliary Loss Search](https://seqml.github.io/a2ls/) is now available in [a2ls](a2ls).
* 2023.3.10 [Bootstrapped Transformer for Offline Reinforcement Learning](https://seqml.github.io/bootorl/) is now available in [bootorl](bootorl).

## Contributing

Expand Down
129 changes: 129 additions & 0 deletions bootorl/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
34 changes: 34 additions & 0 deletions bootorl/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

FROM pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel

WORKDIR /workspace

# Install new cuda-keyring package
# Noted at https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212772
RUN rm /etc/apt/sources.list.d/cuda.list /etc/apt/sources.list.d/nvidia-ml.list \
&& apt-key del 7fa2af80 \
&& apt-get update && apt-get install -y --no-install-recommends wget \
&& wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb \
&& dpkg -i cuda-keyring_1.0-1_all.deb

RUN apt-get update && DEBIAN_FRONTEND=noninteractive \
&& apt-get install -y zlib1g zlib1g-dev libosmesa6-dev libgl1-mesa-glx libglfw3 libglew2.0 cmake git \
&& ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so

# Install MuJoCo 2.1.0.
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/root/.mujoco/mujoco210/bin
RUN mkdir -p /root/.mujoco \
&& wget https://github.com/deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz -O mujoco210.tar.gz \
&& tar -xvzf mujoco210.tar.gz -C /root/.mujoco \
&& rm mujoco210.tar.gz

# Install packages, mainly d4rl, which will also install corresponding dependencies automatically.
RUN pip install -U scikit-learn pandas \
&& pip install git+https://github.com/rail-berkeley/d4rl.git@d842aa194b416e564e54b0730d9f934e3e32f854 \
&& pip install git+https://github.com/openai/gym.git@66c431d4b3072a1db44d564dab812b9d23c06e14

# Pre-download dataset if necessary
# RUN python -c "import gym; import d4rl; [gym.make(f'{game}-{level}-v2').unwrapped.get_dataset() for level in \
# ['medium', 'medium-replay', 'medium-expert', 'expert'] for game in ['halfcheetah', 'hopper', 'walker2d']];"
21 changes: 21 additions & 0 deletions bootorl/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE
53 changes: 53 additions & 0 deletions bootorl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Bootstrapped Transformer
Source code for NeurIPS 2022 paper *[Bootstrapped Transformer for Offline Reinforcement Learning](https://seqml.github.io/bootorl/)*.

## Abstract
> Offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment. Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem, adopting sequence models such as Transformer architecture to model distributions over trajectories, and repurposing beam search as a planning algorithm. However, the training datasets utilized in general offline RL tasks are quite limited and often suffer from insufficient distribution coverage, which could be harmful to training sequence generation. In this paper, we propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data to further boost the sequence model training. We conduct extensive experiments on two offline RL benchmarks and demonstrate that our model can largely remedy the existing offline RL training limitations and beat other strong baseline methods. We also analyze the generated pseudo data and the revealed characteristics may shed some light on offline RL training. The codes and supplementary materials are available at https://seqml.github.io/bootorl.

## Dependencies

Python dependencies are listed in [`./environment.yml`](./environment.yml).

We also provides an extra dockerfile as [`./Dockerfile`](./Dockerfile) for reproducibility.

## Usage

To train the model, run with
```
python main/train.py --dataset hopper-medium-replay-v2 \
--bootstrap True \
--bootstrap_type once \
--generation_type autoregressive
```
or
```
python main/train.py --dataset hopper-medium-replay-v2 \
--bootstrap True \
--bootstrap_type repeat \
--generation_type teacherforcing
```
depending on your choice of hyperparameters and bootstrap schemes. All default hyperparameters used in our experiments are placed at [`./utils/argparser.py`](`./utils/argparser.py`). You can find it in `DEFAULT_ARGS` at the beginning of this file. By default, training logs and saved models are output to `./logs/<environment>-<dataset_level>/` directory.

To evaluate the performance of trained model, run with
```
python main/plan.py --dataset hopper-medium-replay-v2 \
--checkpoint <checkpoint_directory> \
--suffix <output_directory_suffix>
```
where `checkpoint_directory` should be the directory containing your model `state_*.pt`. By default, evaluation results are output to `./logs/<environment>-<dataset_level>/<suffix>` directory.


## Acknowledgements
Some source codes of this work have been implemented on top of *Trajectory Transformer* (https://arxiv.org/abs/2106.02039).
*Trajectory Transformer* uses GPT implementation from Andrej Karpathy's *minGPT* repo.

## Citation
You are more than welcome to cite our paper:
```
@article{wang2022bootstrapped,
title={Bootstrapped Transformer for Offline Reinforcement Learning},
author={Wang, Kerong and Zhao, Hanye and Luo, Xufang and Ren, Kan and Zhang, Weinan and Li, Dongsheng},
journal={arXiv preprint arXiv:2206.08569},
year={2022}
}
```
Loading

0 comments on commit ee16183

Please sign in to comment.