[dev] bootorl paper codes. (#10)

* [dev] bootorl paper codes. * [fix] file structure. --------- Co-authored-by: Kan Ren <[email protected]>
microsoft · Mar 10, 2023 · ee16183 · ee16183
1 parent b61819c
commit ee16183
Show file tree

Hide file tree

Showing 24 changed files with 3,295 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -5,7 +5,8 @@ This repository contains code for a series of research projects on Automated Rei
 ## News
 
 * 2022.9.21 [Towards Applicable Reinforcement Learning: Improving the Generalization and Sample Efficiency with Policy Ensemble](https://seqml.github.io/eppo/) is now available in [eppo](eppo).
-* 2022.10.12 [Reinforcement Learning with Automated Auxiliary Loss Search](https://seqml.github.io/A2LS/) is now available in [a2ls](a2ls).
+* 2022.10.12 [Reinforcement Learning with Automated Auxiliary Loss Search](https://seqml.github.io/a2ls/) is now available in [a2ls](a2ls).
+* 2023.3.10 [Bootstrapped Transformer for Offline Reinforcement Learning](https://seqml.github.io/bootorl/) is now available in [bootorl](bootorl).
 
 ## Contributing
 

diff --git a/bootorl/.gitignore b/bootorl/.gitignore
@@ -0,0 +1,129 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
diff --git a/bootorl/Dockerfile b/bootorl/Dockerfile
@@ -0,0 +1,34 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+FROM pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel
+
+WORKDIR /workspace
+
+# Install new cuda-keyring package
+# Noted at https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212772
+RUN rm /etc/apt/sources.list.d/cuda.list /etc/apt/sources.list.d/nvidia-ml.list \
+    && apt-key del 7fa2af80 \
+    && apt-get update && apt-get install -y --no-install-recommends wget \
+    && wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb \
+    && dpkg -i cuda-keyring_1.0-1_all.deb
+
+RUN apt-get update && DEBIAN_FRONTEND=noninteractive \
+    && apt-get install -y zlib1g zlib1g-dev libosmesa6-dev libgl1-mesa-glx libglfw3 libglew2.0 cmake git \
+    && ln -s /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/x86_64-linux-gnu/libGL.so
+
+# Install MuJoCo 2.1.0.
+ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/root/.mujoco/mujoco210/bin
+RUN mkdir -p /root/.mujoco \
+    && wget https://github.com/deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz -O mujoco210.tar.gz \
+    && tar -xvzf mujoco210.tar.gz -C /root/.mujoco \
+    && rm mujoco210.tar.gz
+
+# Install packages, mainly d4rl, which will also install corresponding dependencies automatically.
+RUN pip install -U scikit-learn pandas \
+    && pip install git+https://github.com/rail-berkeley/d4rl.git@d842aa194b416e564e54b0730d9f934e3e32f854 \
+    && pip install git+https://github.com/openai/gym.git@66c431d4b3072a1db44d564dab812b9d23c06e14
+
+# Pre-download dataset if necessary
+# RUN python -c "import gym; import d4rl; [gym.make(f'{game}-{level}-v2').unwrapped.get_dataset() for level in \
+#     ['medium', 'medium-replay', 'medium-expert', 'expert'] for game in ['halfcheetah', 'hopper', 'walker2d']];"
diff --git a/bootorl/LICENSE b/bootorl/LICENSE
@@ -0,0 +1,21 @@
+    MIT License
+
+    Copyright (c) Microsoft Corporation.
+
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+
+    The above copyright notice and this permission notice shall be included in all
+    copies or substantial portions of the Software.
+
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+    SOFTWARE
diff --git a/bootorl/README.md b/bootorl/README.md
@@ -0,0 +1,53 @@
+# Bootstrapped Transformer
+Source code for NeurIPS 2022 paper *[Bootstrapped Transformer for Offline Reinforcement Learning](https://seqml.github.io/bootorl/)*.
+
+## Abstract
+> Offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment. Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem, adopting sequence models such as Transformer architecture to model distributions over trajectories, and repurposing beam search as a planning algorithm. However, the training datasets utilized in general offline RL tasks are quite limited and often suffer from insufficient distribution coverage, which could be harmful to training sequence generation. In this paper, we propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data to further boost the sequence model training. We conduct extensive experiments on two offline RL benchmarks and demonstrate that our model can largely remedy the existing offline RL training limitations and beat other strong baseline methods. We also analyze the generated pseudo data and the revealed characteristics may shed some light on offline RL training. The codes and supplementary materials are available at https://seqml.github.io/bootorl.
+
+## Dependencies
+
+Python dependencies are listed in [`./environment.yml`](./environment.yml).
+
+We also provides an extra dockerfile as [`./Dockerfile`](./Dockerfile) for reproducibility. 
+
+## Usage
+
+To train the model, run with 
+```
+python main/train.py --dataset hopper-medium-replay-v2 \
+                     --bootstrap True \
+                     --bootstrap_type once \
+                     --generation_type autoregressive
+```
+or
+```
+python main/train.py --dataset hopper-medium-replay-v2 \
+                     --bootstrap True \
+                     --bootstrap_type repeat \
+                     --generation_type teacherforcing
+```
+depending on your choice of hyperparameters and bootstrap schemes. All default hyperparameters used in our experiments are placed at [`./utils/argparser.py`](`./utils/argparser.py`). You can find it in `DEFAULT_ARGS` at the beginning of this file. By default, training logs and saved models are output to `./logs/<environment>-<dataset_level>/` directory.
+
+To evaluate the performance of trained model, run with
+```
+python main/plan.py --dataset hopper-medium-replay-v2 \
+                    --checkpoint <checkpoint_directory> \
+                    --suffix <output_directory_suffix>
+```
+where `checkpoint_directory` should be the directory containing your model `state_*.pt`. By default, evaluation results are output to `./logs/<environment>-<dataset_level>/<suffix>` directory.
+
+
+## Acknowledgements
+Some source codes of this work have been implemented on top of *Trajectory Transformer* (https://arxiv.org/abs/2106.02039).
+*Trajectory Transformer* uses GPT implementation from Andrej Karpathy's *minGPT* repo.
+
+## Citation
+You are more than welcome to cite our paper:
+```
+@article{wang2022bootstrapped,
+  title={Bootstrapped Transformer for Offline Reinforcement Learning},
+  author={Wang, Kerong and Zhao, Hanye and Luo, Xufang and Ren, Kan and Zhang, Weinan and Li, Dongsheng},
+  journal={arXiv preprint arXiv:2206.08569},
+  year={2022}
+}
+```